Build your own operating system #07_virtual_memory_and_paging
This is the 7th article of the “Build your own operating system” article series. In the previous article, we discussed how we can create a user mode to execute user programs, in contrast with kernel mode. Hope you could do it successfully. In this article, we are going to learn about how we can implement virtual memory and paging to our operating system.
Let’s go.
Introduction to Virtual Memory
Simply virtual memory is nothing but, using our secondary storage to extend our RAM. Which is a feature of operating systems that enable computers to be able to use shortages of physical memory by transferring data from random access memory (RAM) to disc storage (HDD).
There are two ways to accomplish this task in x86 architecture. Which are: using segmentation and paging. But, paging is by far the most common and versatile technique. And also we will be using that method to implement virtual memory to our operating system. But remember, some uses of segmentation are still required to allow the code to execute under different privilege levels.
Read more about Virtual memory
Paging
Normally, what segmentation does is translating a logical address into a linear address. On the other hand, paging translates linear addresses onto the physical address space.
Paging in x86
There are a few prerequisites you need to be aware of before learn about paging in x86 architecture. Paging in x86 consists of a page directory (PDT) that can contain references to 1024 page tables (PT). Each of them can point to 1024 sections of physical memory called page frames (PF). And a single page frame has a size of 4096 bytes. The highest 10 bits specifies the offset of a page directory entry (PDE) in the current PDT, the next 10 bits the offset of a page table entry (PTE) within the page table pointed to by that PDE. The lowest 12 bits in the address is the offset within the page frame to be addressed.
All page directories, page tables, and page frames must have 4096-byte addresses. Because the lowest 12 bits of a 32-bit address must be zero, the highest 20 bits of a 32-bit address can be used to address a PDT, PT, or PF.
While most pages are 4096 bytes in size, 4 MB pages are also available. After that, a PDE points to a 4 MB page frame that must be aligned on a 4 MB address boundary. The address translation is nearly identical to that shown in the picture, with the exception of the page table step. Pages of 4 MB and 4 KB can be mixed.
The translation of linear addresses to physical addresses is described in the figure below.
The 20 bits pointing to the current PDT is stored in the register cr3
. The lower 12 bits of cr3
are used for configuration.
Identity paging
The most straightforward sort of paging is the point at which we map each virtual location onto a similar actual location, called character paging. This should be possible at arranging time by making a page catalog where every section focuses on its comparing 4 MB outline. In NASM this should be possible with macros and orders (%rep, times, and dd). It can obviously likewise be done at showtime by utilizing normal gathering code directions.
Enabling Paging
Paging is enabled by first writing the address of a page directory to cr3 and then setting bit 31 (the PG “paging-enable” bit) of cr0 to 1. To use 4 MB pages, set the PSE bit (Page Size Extensions, bit 4) of cr4. The following assembly code shows an example:
; eax has the address of the page directory
mov cr3, eax
mov ebx, cr4 ; read current cr4
or ebx, 0x00000010 ; set PSE
mov cr4, ebx ; update cr4
mov ebx, cr0 ; read current cr0
or ebx, 0x80000000 ; set PG
mov cr0, ebx ; update cr0
; now paging is enabled
Note that all locations inside the page registry, page tables, and in cr3 should be actual addresses to the designs, never virtual. This will be more applicable in later segments where we progressively update the paging structures.
The guidance that is helpful when refreshing a PDT or PT is invlpg
. It discredits the Translation Lookaside Buffer (TLB) section for a virtual location. The TLB is a reserve for deciphered addresses, planning actual addresses relating to virtual addresses. This is possibly required while changing a PDE or PTE that was recently planned to something different. In the event that the PDE or PTE had recently been set apart as not present (bit 0 was set to 0), executing invlpg
is pointless. Changing the worth of cr3 will make all sections in the TLB be refuted.
Below example shows invalidating a TLB entry:
; invalidate any TLB references to virtual address 0
invlpg [0]
Paging and the kernel
In this part, we will discuss how paging affects the OS Kernel. We encourage you to run your OS using identity paging before trying to implement a more advanced paging setup since it can be hard to debug a malfunctioning page table that is set up via assembly code.
Reasons to Not Identity Map the Kernel
There will be a problem when connecting user-mode process code with a kernel located at the beginning of the virtual address space. which is a virtual address space of 0x00000000
, “size of the kernel.” Standard and the best practice is to load code into memory location 0x00000000
while linking. 0x00000000
will be the base address for resolving absolute references. This means the user-mode process can’t be loaded at this virtual address because the kernel is mapped to the virtual address space.
Using a linker script that instructs it to assume a different beginning address is not an easy solution for the users of the operating system to deal with.
We’re also assuming that we want the kernel’s address space to be part of the user-mode process’s address space in this scenario. The fact that we don’t have to alter any paging structures in order to access the kernel’s code and data is a good feature, as we’ll see later. A user process cannot read or write kernel memory unless they have privilege level 0.
The Virtual Address for the Kernel
Ideally, the Kernel ought to be set at an extremely high virtual memory address, for instance, 0xC0000000
(3 GB). The client mode measure isn’t probably going to be 3 GB enormous, which is presently the solitary way that it can struggle with the piece. At the point when the bit utilizes virtual addresses at 3 GB or more, it is known as a higher-half portion. 0xC0000000
is only a model, the bit can be set at any location higher than 0 to get similar advantages. Picking the right location relies upon how much virtual memory ought to be accessible for the portion (it is most straightforward if all memory over the bit virtual location ought to have a place with the piece) and how much virtual memory ought to be accessible for the cycle.
In the event that the client mode measure is bigger than 3 GB, a few pages should be traded out by the bit. Trading pages isn’t essential for this book.
Placing the Kernel at 0xC0000000
The kernel ought to be set at 0xC0100000
, instead of at 0xC0000000
, in light of the fact that this permits (0x00000000
, 0x00100000
) to be meant (0xC0000000
, 0xC0100000
). Memory (0x00000000
, “size of the Kernel”) is planned to the reach (0xC0000000
, + 0xC0000000
+ “size of the Kernel”) thusly.
The position of the kernel is easy, however, it needs a tad bit of reasoning. By and by, this is a linkage issue. For instance, in the event that we use migration in the linker script (see “Connecting the Kernel”), the linker will feel that our kernel is stacked at the actual memory address, 0x00100000
; not 0x00000000
. On the off chance that the leaps are settled utilizing 0xC100000 as the base location, a kernel hop will jump straightforwardly into the client mode measure code (recall that the client mode measure is stacked in virtual memory 0x00000000
).
In any case, we can’t simply advise the linker to assume that 0xC01000000
is the place where the kernel begins (is stacked), in light of the fact that we need it to be stacked at 0x00100000
. To keep the kernel from being stacked at 0x00000000
since there is BIOS and GRUB code stacked beneath 1MB, it is stacked at 1 MB. Since the PC might have under 3 GB of actual memory, we can’t believe that we’ll have the option to stack the kernel at 0xC0100000
by the same token.
As a workaround, the linker content can utilize both the migration guidance (.=0xC0100000
) and the AT guidance. Address computations for non-relative memory-references ought to use the migration address as a beginning stage for working out addresses for movement species. The kernel ought to be stacked into memory at the predetermined species. At connect time, GNU ld plays out the migration, and GRUB handles the heap address indicated by AT when stacking the kernel.
Higher-half Linker Script
Modify your linker script to this:
ENTRY(loader) /* the name of the entry label */SECTIONS {
. = 0x00100000; /* the code should be loaded at 1 MB */.text ALIGN (0x1000) : /* align at 4 KB */
{
*(.text) /* all text sections from all files */
}.rodata ALIGN (0x1000) : /* align at 4 KB */
{
*(.rodata*) /* all read-only data sections from all files */
}.data ALIGN (0x1000) : /* align at 4 KB */
{
*(.data) /* all data sections from all files */
}.bss ALIGN (0x1000) : /* align at 4 KB */
{
*(COMMON) /* all COMMON sections from all files */
*(.bss) /* all bss sections from all files */
}
}kernel_end = .;
Entering the Higher Half
At the point when GRUB leaps to the kernel code, there is no paging table. In this way, all references to 0xC0100000 + X
will not be planned to the right actual location, and will along these lines cause a general protection exception (GPE) at the absolute best, in any case (if the PC has multiple GBs of memory) the PC will simply crash.
So, assembly code that doesn’t use relative jumps or relative memory addressing must be used to do the following:
- Set up a page table.
- Add identity mapping for the first 4 MB of the virtual address space.
- Add an entry for
0xC0100000
that maps to0x0010000
If we skip the above second option, the CPU would generate a page fault immediately after paging was enabled when trying to fetch the next instruction from memory. After the table has been created, a jump can be done to a label to make eip
point to a virtual address in the higher half:
; assembly code executing at around 0x00100000
; enable paging for both actual location of kernel
; and its higher-half virtual location
lea ebx, [higher_half] ; load the address of the label in ebx
jmp ebx ; jump to the label
higher_half:
; code here executes in the higher half kernel
; eip is larger than 0xC0000000
; can continue kernel initialisation, calling C code, etc.
The register eip
will presently highlight a memory area someplace just after 0xC0100000
, all the code would now be able to execute as though it were situated at 0xC0100000
, the higher-half. The passage planning of the initial 4 MB of virtual memory to the initial 4 MB of actual memory would now be able to be taken out from the page table and its relating section in the TLB discredited with invlpg
.
Running in the higher half
There are a couple of more subtleties we should manage when utilizing a higher-half kernel. We should be cautious when utilizing memory-planned I/O that utilizes explicit memory areas. For instance, the casing cushion is situated at 0x000B8000
, however, since there is no access in the page table for the location 0x000B8000
anymore, the location 0xC00B8000
should be utilized, since the virtual location 0xC0000000
guides to the actual location 0x00000000
.
Any unequivocal references to addresses inside the multiboot structure should be changed to mirror the new virtual addresses too.
Planning 4 MB pages for the kernel is basic, however squanders memory (except if you have a huge kernel). Making a higher-half kernel planned in as 4 KB pages save memory, however, are more earnestly to set up. Memory for the page registry and one-page table can be held in the .information segment, yet one necessity to design the mappings from virtual to actual addresses at run-time. The size of the kernel can be dictated by sending out marks from the linker script, which we’ll have to do later at any rate when composing the page outline allocator.
Virtual Memory Through Paging
There are two things that paging enables and, are good for virtual memory. First, it allows for fine-grained access control to memory. You can mark pages as read-only, read-write, only for PL0, etc. Second, it creates the illusion of contiguous memory. User-mode processes, and the kernel, can access memory as if it were contiguous, and the contiguous memory can be extended without the need to move data around in memory.
We can also allow the user-mode programs access to all memory below 3 GB, but unless they actually use it, we don’t have to assign page frames to the pages. This allows processes to have code located near 0x00000000
and the stack at just below 0xC0000000
, and still does not require more than two actual pages.
Further reading
If you are interested in learning more about implementing virtual memory by paging, please refer below links:
- http://en.wikipedia.org/wiki/Paging
- http://wiki.osdev.org/Paging
- http://wiki.osdev.org/Higher_Half_bare_bones
- http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory
- http://flint.cs.yale.edu/cs422/doc/ELF_Format.pdf
You can download a completed code that I have created for implementing virtual memory with paging for the OS from: here
Catch you in the next article.
Thank you!