LOCAL BUS SYSTEMS IN THE AT-PC AND HOW TO PROGRAM DEVICES:

I describe here some parallel buses, which are sometimes called "local" and are made to rule devices, which are partly on plugable cards too. Besides adress- and data-lines, there are other lines too for interrupts and control of states (i.e. reset and PCI-BIST).
The control of devices is dependent on the controllers of these devices. Registers and meanings of bits in the controllers are not described here, but only the interface to adress spaces and control of the CPU. Due to this there is of interest, how to programmatically control devices by writing "device drivers".

ISA ("Industry Standard Architecture"):

The ISA-bus is of interest only for historical reason, but it is alive as an interface-standard in the PCI-bus because it represents directly the I/O-adress-range of the CPU in some respect: 8/16 bits wide data bus and up to 20 bits wide adress bus. The adresses are to program using I/O-commands "in" and "out", which are typical for Intel-CPUs. Details about this are not explained here. I only mention the keywords "IDT" and "interrupt-controller".
There is to say too, that system clock, IRQ- and DMA-lines are also a part of this bus and are a standardized connection in the AT-PC-design. But an other part of lines are available for controllers. On the ISA-bus connection of these lines is done using jumpers - a nearly "hard" connection too. The same method is used to connect the registers of controllers to the adress space of the CPU.
Besides these connections other adress ranges in controllers are to connect to CPU adress-ranges too for the purpose of buffers and BIOS-extensions.
Because jumpers for adressing, amount of data- and adress-bits and slow clocking (about 8 MHz) where treated to be a disadvantage, the vendor Intel propagated the PCI-bus, which should make plugable devices more independant of the design of the AT-PC. But the "bridges" between PCI-bus and host machine needed to interface anyway natively - in a AT-PC like the ISA-bus.
Before any PCI-card can appear as an ISA-card, now an initialization has to be done, which wasn't needed on a ISA-bus.

PCI ("Peripheral Connection Interface"):

Different to the ISA-bus the PCI-bus has an own adress range for configuration of devices at the bus. This allows to do programmatically, what jumpers did on the ISA-bus. Every device has registers mapped in an own configuration space, which can't be directly adressed over I/O- or memory-adress-bus. A special PCI-adress has to be written instead to a register of the bridge, which is an I/O-port.
A similar method is already known from the CMOS-clock or VGA-video-cards. But PCI-adresses are not only an offset. Because of this more than one different types of PCI-adresses can be normalized.
If you want to know, if a PCI-bus is in the system, you only have to read the adressport of the bridge. You will read a 1 in every bit, if there is no PCI-bus. This is the same test as needed, if any other register is to find existant. The same test can be done to find existant configuration spaces using PCI-adresses. But if a configuration space exists, you will find a 0 in every bit, if a certain register is not existant.

ADRESSING OF CONFIGURATION SPACES AT THE PCI-BUS:

Before any register of any controller can be accessed, the adresses must be mapped into the adress spaces of the CPU. This is done by defining a base-adress of the registers adresses, which are offset-adresses. In the AT-PC there are two adress-ranges available for mapping. But the choice is reasoned by the construction of controllers and not rules of PCI! The same is to say about interrupt lines, which are mapped into the range of lines available in the native interrupt-controller of the AT-PC. But the final relation is done in interrupt-handlers, because there are only 4 lines, where requests of different controllers are ORed.
The bridge to PCI-bus 0 (often called "north-bridge") has two 32-bit-registers in the I/O-adress-range of the CPU. These are the configuration-adress-port at CF8h and the configuration-data-port at CFCh
Whenever you want to access a configuration space of a device, you first need to write a special PCI-adress to the adress-port, which maps the adressed 32-bit-register inside the configuration space to the data-port, where you can read or write the contents. These contents are 32 bit. If you want to access only 8 or 16 bit, you need to change the I/O-adress and not the PCI-adress! The bits 0-7 are in port CFCh, the higher bits in CFDh, CFEh and CFFh.
The PCI-adress isn't a normalized value dependend on a certain device - different to normalized I/O-adresses as the port-adresses of the PCI-bridge. The attachment of devices to PCI-adresses is done by stacking the existant ones. Because of this, drivers can't be attached without a search for the device they depend on. Every existant device is found by reading the first register (=vendor-ID/device-ID) in a configuration space, selected by incrementing the PCI-adress (see below how to do that!). The value FFFFFFFFh is the crite for a not existing device. Other registers will contain the same value, but can contain it too, if the device is well existant.
Before any device-driver will do this search, the AT-BIOS will have done it already during boot time. The BIOS has to find those boot mediums too, which are not standardized mapped in I/O-space - i.e.USB-boot. To make such a search as easy as can be, some meanings of registers and bits are normalized. The type of PCI-adress, which has to be used for this purpose is of type 1 with the following meanings of bits:
0,1 type of adress =1 (dual digit!)
2-7 register-byteadress ( but in DWORD-raster! = 00h, 04h, 08h ... 3Ch / inkrement for 32 bit is =4 !)
8-10 function#
11-15 device#
16-23 bus#
24-30 reserved
31 enable=H (must be set during a search!)
The PCI-adress type 0 differs in bits 0,1 =0 and reserved bits 11-31 (=0 too). Then special cycles are needed too, which I do not explain here, because this is normally not needed.
The function# is related to a "logical" device, which is a part of a device, named by the device-ID. This difference is made, if there are more than one functions on a card - i.e.USB and ethernet. In that case the device# is the same, but the function# differs. Every function can be initialized in an own configuration space.
In a configuration space you can not only attach the registers of a controller, but IRQ# too. Besides this mapping of register-ranges and memory-ranges can be done defining base-adresses. Those base-adresses can be in I/O- or memory-range. But there are limitations depending on AT-standards, actual physical memory, mode of CPU (real or protected) and of course the construction of controllers.

CONFIGURATION SPACES ON THE PCI-BUS:

Every configuration space consists of 64 byte-registers, adressed in heaps of 4 at the same time and parallel accessible or at four I/O-ports in CFCh, CFDh, CFEh, CFFh. The meaning of the registers depends on the header-ID, which has to be read first, if you do not know anything about a device. I explain first the meanings of registers and bits, which are always the same (decimal register# / hexadecimal byteoffset):
Register 01/00h:
Bits 0-15 =vendor-ID, Bits 16-31 =device-ID (read only)
Read this register first, if You search for a certain device (or an existant configuration space)! Which ID is related to which vendor or device is listed on pages in the internet (or in my sources). Every ID is unique. If an ID=0 is found, the configuration space might be reserved, but not yet used - ignore it...
Register 02/04h:
Bits 0-15 =command with bits meaning:
Bit 0 : Enable I/O =H
Bit 1 : Enable memory =H
As both bits can be set, the same registers in a controller can be mapped to I/O- or memory-range too!
Normalized ranges as i.e. VGA-registers are not mentioned in configuration spaces.
Bit 2 : Enable master behavior =H
Bit 3: Enable special cycles =H
Bit 4: Enable invalidate =H
Bit 5: Enable VGA palette snooping =H
Bit 6: Enable PERR# =H
Bit 7: once ...Enable adress/data-stepping =H ...now reserved!
Bit 8: Enable SERR# =H
Bit 9: Enable back-to-back =H
Bit 10: once ...reserved, now ...Interrupt disable =H (set, if there is no interrupt-handler!)
Bits 11-15 reserved
A command =0 makes every register of a device unreachable accept those needed for configuration! If you re-define base-adresses, switch at least bits 0,1 =0 before!
Bits 16-31 =status with bits meaning (related to bit 16 aequivalent bit 0):
Writing to this register can only reset and not set bits!
Bits 0-2 reserved
Bit 3 once ...reserved, but since PCI-Version 2.3 interrupt-status
Bit 4: "capabilities pointer" =H / Makes a value in register 14, bits 0-7 valid.
Bit 5: 66 MHz Bus-clock =H / 33 MHz Bus-clock =L
Bit 6 reserved
Bit 7: back-to-back transaction capability =H
Bit 8: Set if PERR# used (parity error)
Bits 9-10: DEV-select-timing: LL=fast, LH=medium, HL=slow, HH=reserved
Bit 11: =H target-abort
Bit 12: As bit 11, but set by busmaster for target
Bit 13: As bit 12, but master-abort =H
Bit 14: SERR# asserted =H (i.e.during BIST)
Bit 15: PERR# asserted =H (parity error)
Register 03/08h is byte-oriented and read only:
LL-Byte: =Revisions#
Next 3 bytes are the class-code consisting of:
LH-Byte: register-level-programming-interface HL-Byte: Sub-class
HH-Byte: Base-class
Register 04/0Ch is byte-oriented and read only:
LL-Byte =cache-size
LH-Byte =latency-timer
HL-Byte: header type (read only)
This byte defines the meaning of the following registers! Here the normal meaning of type 0 is described...
Other already defined types are (dual digits):
01 = PCI to PCI bridge, 02 = card bus bridge
Other types are reserved...
Bit 7 (=MSB) =H : the device is a multi function device
Read this byte before initialisation!
HH-Byte: BIST ("Build In Self Test") with following meanings:
Bits 0-3: success=LLLL / other values tell an error Bits 4-5 reserved Bit 6 Set =H to start BIST. Reset by device in case of success
Bit 7 =H : BIST supported

The meaning of the following registers is dependent on the type of the header.
Registers 05/10h - 10/24h contain base-adresses of registers, buffers, windows or ROM-ranges (BIOS).
Not existant registers =0 !
Bits in adressregister have different meanings due to a memory or I/O reference:
Bit 0: =H: Base-adress in I/O-range. In this case bit 1 is reserved and bits 2-31 are the base-adress (with bits 0,1 =0 !)
Bit 0: =L: Base-adress in memory, which is absolute, physical. In this case bits 4-31 are the base-adress (with bits 0-3 =0 !) Then... Bits 0-3 have the following meanings:
Bits 1,2 =LL : base-adress in 32-bit-adress-range
Bits 1,2 =HL : base-adress in 64-bit-adress-range
Bits 1,2 =LH und =HH are reserved
Bit 3 =H : prefetchable
These base-adresses will be defined in every case first by the AT-BIOS maybe using BIOS extensions of devices too. If you want to play with it, you will have to respect every(!) base-adress of every(!) device!
There must not be any conflict at any adress, because some registers, especially status-registers are written too by controllers. This can result a short cut!
As I strongly recommend not to touch those definitions, I only mention, that there is a standardized method to find out the length of a defined range. Therefore some bits can't be re-defined ("don't care bits", which are in fact "don't touch bits").
Register 11/28h contains the cardbus CIS-pointer
Register 12/2Ch contains in bits 0-15 the subsystem-vendor-ID and in bits 16-31 the subsystem-ID (read only)
Register 13/30h contains the base-adress of an extension-ROM (BIOS) with following meaning of bits:
Bit 0: Enable =H
Bits 1-10 reserved
Bits 11-31> base-adress in memory with bits 0-10 =0 !
Register 14/34h Bits 0-7 are the "capabilities-pointer", if bit 4 in status-register =H. This is dependent on the controller. Because of this, the data-sheet is of interest and not these bits. The other 24 bits are reserved
Register 15/38h reserved
Register 16/3Ch defines the interrupt-routing in byte-registers:
LL-Byte =IRQ-input at the interrupt-controller, set by BIOS during POST
LH-Byte =IRQ-pin (read only), normalized for only 4 lines using dual digits:
1=IntA, 2=IntB, 3=IntC, 4=IntD ....a 0 defines a device, which does not request for interrupts.
If there is no multi function device, only IntA will be used ORed with other requests from other devices!
HL-Byte =Min_Gnt (R-only) : Length of a burst-period
HH-Byte =Max_Lat (R-only) : latency of accesses to PCI-bus
Interrupt-handling isn't simply done because of only 15 lines at the interrupt-controller, which are mostly reseverd for standardized purpose.
For this reason, there may be an advantage, if you do not write a single interrupt-handler, which wastes time selecting the source of a request. Instead of this switch off the not used requests (interrupt disable) and establish a fitting (optimized) interrupt-handler due to the actual task - really simply done only under ASMOS.

If you write to registers, which are defined as "read only", there will be no effect. But if you write to reserved bits (set or reset), an error will occur! You will have to read first the contents and then re-define the interesting bits using AND and OR
Many devices come with an own BIOS-extension (i.e.video-cards). I do not recommend to use it, because normally there is too much to do, to make it useable. You will at least need to define special 16-bit-descriptors (no problem in ASMOS) and keep them untouchable (in ASMOS only recommended during boot time using drivers...).
The much better way to success is to directly deal with the registers of a controller. This can be easily done, if the device is standardized in AT-design. There is no need to read in configuration spaces.
Some devices consist of two register ranges (i.e.video-cards). The one is standardized (as VGA), the other is mapped in configuration space.
For this reason the meaning of registers and bits in controllers isn't a part of AT- or PCI-standards, but is defined in data-sheets and documentation of vendors.
On the other hand side most "well" documented base-adresses are wrong (even those in some BIOS-spaces!)! Only those, defined in a configuration space are true!

There is a PCI-BIOS too, available in real mode using INT 1Ah . The use in protected mode needs an intialization and special descriptors - forget it...

DRIVERS FOR DEVICES AT THE PCI-BUS:

Here "drivers" are called those programs, which do directly deal with controllers of devices, doing I/O and control adressing registers. This needs to be stated, because often programs are called "drivers" too, which deal with abstractions defined in other programs (i.e. TWAIN-driver, Xserver...).
Drivers normally consist of two well distinguishable layers. The one is the consequence of construction and purpose of controllers, the other is the consequence of the operating system, under which the driver serves.
If you want to write a driver for devices, which are now in an AT-PC normally interfaced with a PCI-bus, you have to do the following in relation to a controller:
You have to engage yourself with purpose and registers of the controller and then to write consequently sequences for initialisation and working. Almost every working state needs interrupt-handling too, because of asynchronuos events, not clocked by the CPU. All of these sequences are assignments of values to registers, which often depend only on bits and need therefore calculation in the CPU (ORing, ANDing, shifting). Finally there is to say, that mostly more than one controller is engaged in a working state - i.e. timer, interrupt-controller, DMA-controller. At least the PCI-bridge-controller is at work too.
Very different to other programs, which make assignments in the memory-range at adresses, which are changeable, drivers need to assign values to absolute adresses in I/O- or memory-range. These adresses are normalized or given by controllers, which make them mappable in the PCI configuration space. As far as adresses are mappable, the BIOS does it and you need not touch the decisions. But you will have to make these base-adresses known in the driver. Therefore you have to deal with the PCI-controller and the configuration space as described above.
As drivers use adresses, which are a priori known, an operating system has only to provide a descriptor for the whole memory range and the base-adress of the IDT to enable the driver to establish its interrupt-handlers.
Data of programs, which should be moved by the driver, can be moved to or from the driver using registers of the CPU - normally a base-adress of source or destination and a length of data to move is to give. And of course some bits are needed to command what to do.
So easy it is done under an operating system as ASMOS
You cannot do it easier. But of course you can do it more complicated!

The worst mistake you can do when linking drivers to other programs (the "user space") is, to want code drivers using "higher" languages, by name C/C++, which seems to allow it.
No "higher" language can deal with descriptors, which define in protected mode the base-adresses of programs or data or memory-mapped registers. Thus the seemingly fine solution is the use of the page mode. But this solution is none, because the needed time for combination of adresses in page mode makes the advantage of memory mapped registers a disadvantage! In the page mode there are three additional transfers of values in "page tables" needed to complete an adress. This costs at least 50% of speed. This is the time you win with memory-mapped registers. The consequence condensed: you do not only waste a lot of memory for page tables, but slow down the needed speed to handle sound or movies (...and other desirable things).
Of course you can waste more time and memory space! Make multi-tasking...
This facts can't be disputed. But there are a lot of people, who do not give up. Instead of common sense they perform a quasi-religious conjuration of advantages, which UNIX-C/C++-multi-tasking-supervisor-systems should make available. At least "machine indepedance" is conjured as "system philosophy". But this is really nonsense, when you need to command a machine. This can't be done in an other way than machine dependant!
Such funny systems exist anyway and thus you can study how delution works. You can see, that seemingly shorter expressed source code does not shorten anything - condensed: not only more binary code, not only more source code, not only exploding file systems and protocol layers, not only vast work for linking, not only endless compile times during installation process, but no chance to reuse code or adapt advantages of technical evolution. This makes the enjoyable part of a programmers life much shorter...
There is only one advantage of such systems undisputable. They are not intelligible and therefore can be sold. Especially the needed documentation for linking can be sold by high prices - too high prices sometimes. The EG-commission actually stated this and condemned Mikros$ft...(This company made a lot of money too selling C/C++ compilers)
How drivers look like in consequence of such a system philosophy, can be read in open sources, which are to compile using the GNU-compiler. You can look at that foolish dances needed to make different things to one thing with one name. This virtualisation is the cost of machine independance, well hidden inside a mighty supervisor operating system. One of the consequences is, that a lot of advantages of a lot of machines isn't used - in the first line the very well done protected mode of the CPU and the use of descriptors and selectors in the user space.
You can see there too, that a lot work has to be done for writing and searching during programming - and that the really needed steps are nebulous hidden. To make beginners see through the fog, I have to write some more letters.

As said there are register-adresses and meanings of bits the only thing to work with. You can find that in header files (filename extension *.h) in long lists of #define statements. This looks like that:
#define CRTC_H_SYNC_STRT_WID 0x0004
The needed part of this expression is the well expressible 4, which is an offset adress. In a assembler-program this is a immediate value in a command, which makes an assignment too. A C/C++-programer has to deal instead with the inexpressible "CRTC_H_SYNC_STRT_WID" and can not add the offset to the base without a typecasting. Only time wasting (searching) linker- and praeprocessor-programs can move the 4 to the right place in the program, while the programer hasn't less to do marking the destination of the 4 using that inexpressible makro. And of course he has to write 20 signs instead of 1 without a mistake!
Meanings of bits are virtualized in the same way. The best case is the definition of a mask, not expressed in binary values (as in assembler), but hexadecimal and thus making a pocket calculator in reach needed. The worst case of definition is a virtualization spread over a lot of files, changing the name too.
The easy to use commands "in" und "out" are similar foolish disguised, sometimes as functions like "inb(adress,value)", where both parameters can be a makro to search for in other files. The aequivalent assembler-command "in al,dx" instead follows a sequence of commands, which tell clearly, which value is moved where. You may find this complicated, but in the worstcase the controller will teach you the nature of that things. Although functions like inb() are inlined, the normal transfer of parameters via stack is done (because this is normal in C ). That's why somtimes desperate encapsulation of inline assembler inside macros is to find.
The interesting sequences of moves and adressing are normally done using assignments out of structs. This looks like that:
u32 ref_clk_per = info->ref_clk_per;
If you do not know, that "u32" is a type, defined anywhere else, but seemingly meaning 32 bits, you will have to search for enlightenment in those lots of #included files. The case can be, that you do not find there, what you are searching, but in an other file, #included in those files.
But hell is really opened, when drivercode of some hundred bytes has to be linked to certain protocols or fileformats - i.e. ethernet to TC/IP and HTML. Although this can be a single browser, normally the working C/C++-code is spread over the whole system, linked using environment-variables, which are not only to compile, but need to be handled during boot- and run-time too. This puzzle makes a lot of searches needed.
I made the experiance, that drivers, written for an assembler operating system, are at least 5 times shorter. And even the sourcecode isn't longer, because a lot of #define and #include is superfluous. And of course you need no linker, because the adresses are absolute. I think You need to be a system-philosoph instead of a programer, when you want to get higher anyway instead of straight through.

If You believe me, that drivers do exactly the same wanted thing without a special supervisor-multi-tasking-operating-system, You should develop the sequences using assembler and ASMOS. You instantly can assemble within parts of seconds using ASMn or ASMat, and can easily test your program using files as source or destination for data or status. If anything went wrong, a re-boot is done within 10 seconds. When everything runs well, You can decide early enough, if You want all that nasty virtualisation and mystery, which is needed in higher spheres. But the NASM-dialect can be untouched then (useable as "modul"), because NASM runs on nearly every platform.
Detailed information how to command the PCI-controller can be found in the program "DUMP to file" which is a file in a FD-filesystem readable under ASMOS and to assemble using ASMat. You can find the filesystem as a FDimage on my homepage. This program contains code too, which demonstrates how to call and use "FP_PCIsearch" in ASMOS and how to easily link to menu-mode and make in- and output of values. You will need much more code, if You want this under other operating systems.
Besides this the program demonstrates, that you can shrink your efforts to some promille using assembler and ASMOS. The code depending on PCI is some 100 bytes of binary in that program and in ASMOS. To code the same task under LINUX, Martin Mares wrote the "PCI Utilities". His package consists of more than half a million bytes, while I needed about 1000 bytes sourcecode.

  • Go to my homepage: www.rcfriz.de