Presentation
Welcome to the first of a series of articles about the Linux
kernel secrets. Probably you already took a look at the kernel
sources some time in the past. In that case you noticed that the
initial couple of 100-kb compressed files has turned into more
than 300 files containing more than 2 million source code lines,
and taking as many as 9 Megabytes of compressed storage.
This series is intended not for newbies but advanced
programmers. Obviously you're free to read it anyway, and the
author will do his best to answer any question or doubt you send
through e-mail.
New bugs are discovered and new patches are published mostly
every day. Nowadays it's mostly impossible to understand the
source code in a whole. It's co-written by lots of different
programmers who try to keep an homogeneous coding style, but in
fact it differs from each other.
Linux: The Internet Operating System
Linux is a freely distributable operating system for PC
architecture and others. It's compatible with the POSIX 1003.1
standard and includes a large number of features from Unix System
V and BSD 4.3. Many substantial parts of the Linux kernel this
series is writing about, were written by Linus Torvalds, a Finish
computer science student. The first kernel was released on
November, 1991.
Main Features
Linux solves mostly all needs of a current Unix-based operating system:
- Multitasking
Linux supports true multitasking. All processes are
independent. None of them must release the processor to execute
other process.
- Multiuser accessibility
Linux is not only a multiuser operating system, but also has multiuser
accessibility. Linux is able to share the same system resources among
users connected through different terminals attached to the host.
- Executables loaded on demand
Only needed parts of a program are loaded into memory to be executed.
- Memory pagination
If the system memory is fully exhausted, Linux will then search fo
r 4K-sieLinux entoncesd memory pages to be released from memory and stored
on the hard disk. If any of these pages is required again, Linux will
restored it from disk into its original memory location. Old unix systems
and some current platforms, including Microsoft Windows, memory is swapped
into disk. That means that all memory pages belonging to a task are saved
on disk when there is a memory shortage, but this is less efficient.
- Dynamic disk cache
MSDOS users are used to work with SmartDrive, a program which reserves
some fixed area of the system memory for disk caching. Linux instead has a
lot more dynamic disk caching system: reserved memory for cache is
enlarged when memory is unused, or shrinked as needed when system or users
processes demand more memory.
- Shared libraries
Libraries are sets of routines used by programs to process data. There is
a number of standard libraries used from more than one process at the same
time. These libraries are included onto every executable file in old
systems, and loaded redundantly into memory everytime a new process using
is the same library is executed, so spending more memory space.
compartida. In modern systems like Linux, shared code is loaded just once,
and shared among all processes that use it.
- Standard POSIX 1003.1 100% compliant. Some System V and BSD
features supported.
POSIX 1003.1 defines an standard interface for Unix operating
systems.This interface is described as a set of C routines, and is
currently supported by all modern operating systems. Microsoft
Windows NT has support for POSIX 1003.1. Linux 1.2 is 100%
compliant with POSIX. Additionally, some System V and BSD
interfaces are supported or being implemented for further
compatibility.
- Several executable file formats
Who would not like to run any DOS, Windows95, FreeBSD or OS/2
application under Linux? So DOS, Windows and Windows95 emulators
are under development. Linux is also able to run binaries from
other intel-based Unix platforms compliant with the iBCS2 (intel
Binary Compatibility) standard.
- Several filesystem formats
Linux support a large number of file system formats. The most
commonly used format used nowadays is the Second Extended File
System (Ext2). Another supported file system format is the File
Allocation Table (FAT) used by DOS-based systems, but FAT is not
ready for security or multiuser access due to its design
restrictions.
- Networking
Linux is able to be integrated into any local area network. Any unix
service is supported, including Networked File System (NFS), remote login
(telnet, rlogin), dial-up SLIP and PPP, and so on. Integration as server
or client for other networks is also supported, including filesharing and
printing in Macintosh, Netware and Windows.
- System V IPC
Linux uses this technology to provide inter-process message queing, semaphores
and shared memory.
Compiling the Kernel
Let's take a look at the kernel source code before studying the
kernel itself.
Source tree structure:
Linux kernel sources are commonly located under the /usr/src/linux directory,
so we'll mention directories as relative to this location. As a result of the
porting to non-Intel architectures, the kernel tree was changed after version
1.0. Architecture-dependent code is located under the arch/ hierarchy. Code
for Intel 386, 486, Pentium and Pentium Pro processors are under arch/i386. The
arch/mips directory is for MIPS-based systems, arch/sparc for Sun Sparc-based
platforms, arch/ppc for PowerPC/Powermacintosh systems, and so on. We'll
concentrate on the Intel architecture as this is the most widely used with Linux.
The Linux kernel is just an standard C program. There are only two important
differences. The starting point for programs written in the C language is the
main(int argc,char **argv) routine. Linux kernel uses start_kernel(void).
The program environment does not exist yet when the system is starting up and the
kernel is to be loaded. This means that a couple of things are to be done
before the first C routine is called. The asembler code that perform this
task is located under the arch/i386/asm/ directory.
The appropiate assembler routine loads the kernel into the
absolute 0x100000 (1 Mbyte) memory address, then installs the interrupt
servicing routines, global file descriptor tables and interrupt descriptor
tables, that are exclusively used during the initialization process. At this
point, the processor is turned into protected mode. The init/ directory
contains everything you need to initialize the kernel. Here is the
start_kernel() routine, dedicated to initialize the kernel properly, taking in
consideration all passed boot parameters. The first process is created
without using system calls (system itself is not loaded yet). This is the
famous idle process, the one which uses processor time when not used by any
other process.
The kernel/ and arch/i386/kernel/ directories contain, as suggested by their
path names, the main parts of the kernel. Here is where main system calls are
located. Here are implemented other tasks including the time handler,
the scheduler, the DMA manager, the interrupt handler and the signal
controller.
Code handling system memory is located in mm/ and arch/i386/mm/.
This area is devoted to the memory assignation and release for processes.
Memory paging is also implemented here.
The Virtual File System (vfs) is under the fs/ directory.
Different supported file system
formats are located in different subdirectories respectively.
The most important file systems are Ext2 y Proc. We'll take a detailed look at later
them later.
All operating systems require a set of drivers for hardware components.
In the Linux kernel, these are located under drivers/.
Under ipc/ you will find the Linux implementation of the System V IPC.
Source code to implement several network protocols, sockets and internet
domains is stored under net/.
Some standard C routines are implemented in lib/, enabling the kernel
itself to use C programming habits.
Loadable modules generated during the kernel compilation are saved in
modules/, but it's empty until the first kernel compilation is done.
Probably the most important directory used by programmers is include/.
Here you find all C header files specifically used by the kernel. Specific
kernel header files for intel platforms are under include/asm-386/
Compiling: A new kernel is basically generated in just three steps:
- First of all, configuring kernel customizable options with "make config",
"make menuconfig" or "make xconfig" (different interfaces for the same
configuring stage)
- Then, all source code dependencies are rearranged with "make depend"
- and then the real kernel compilation is performed with "make"
We will get on details about the backgrounds for these scripts and how to modify them to
introduce new configuration options in next articles.
I hope you enjoyed this article. You're free to email your comments, sugestions
and criticisms to elesende@nextwork.net.
|