Difference between revisions of "84PCE:Wait States"
(Even more testing of usb wait states.) |
(→Wait State Layout: You can map up to 12 MB of serial flash) |
||
(11 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:84PCE:General_Hardware_Information|Wait States]] | [[Category:84PCE:General_Hardware_Information|Wait States]] | ||
== Synopsis == | == Synopsis == | ||
− | The eZ80 processor is able to perform a memory access in a single cycle. However, on the TI-84+CE, accesses will actually take longer due to wait states. For example, a read from RAM will take 4 cycles, because it has 3 wait states. The wait states for Flash accesses can be customized, but it is unknown whether that is the case for other memory regions. | + | The eZ80 processor is able to perform a memory access in a single cycle. However, on the TI-84+CE, accesses will actually take longer due to wait states. For example, a read from RAM will take 4 cycles, because it has 3 wait states. The wait states for parallel Flash accesses can be customized, but it is unknown whether that is the case for other memory regions or for the newer serial flash. |
+ | |||
+ | == Flash Access Wait States == | ||
+ | |||
+ | === Parallel Flash === | ||
+ | Calculators produced prior to revision M use a parallel flash chip. These chips are limited to a maximum read rate under 20 MHz, necessitating at least 1 wait state for the ASIC's faster 48 MHz clock. The ASIC's internal memory mapping hardware adds a minimum of 5 wait states just to ferry the request to the flash chip and the response back to the CPU, but additional external wait states are required for the flash chip to make its reply. See [[84PCE:Ports:1005]] for more information. | ||
+ | |||
+ | === Serial Flash === | ||
+ | Calculators produced starting in 2019 with revision M no longer use a parallel flash chip, but instead an SPI flash chip. This flash chip needs much longer to initially retrieve data from a random address, but can stream sequential data at a reasonable rate. A new ASIC design takes advantage of this capability by adding a cache between the flash chip and CPU. | ||
+ | |||
+ | According to tests performed by [[User:Jacobly|Jacobly]], the cache is 8 K in size, structured as a 2-way set associative cache with 128 sets of 32 bytes chosen by address bits 5-11. A fetch from the same cache line as previously accessed costs 1 wait state (so the read takes a total of 2 cycles), a fetch from a different cache line costs 2 wait states, and a cache miss costs 194-200 wait states. | ||
+ | |||
+ | Code execution from flash displays high locality of reference, so amortized performance should be reasonably good, although the 200 cycle penalty for a cache miss probably hurts a fair bit. For flash-resident data files, programmers should try to organize them to keep related data together to maximize cache utilization. A good access pattern can make accessing data in flash twice as fast as getting it from RAM, while a bad pattern can make it substantially worse than even pulling it from the old parallel flash. | ||
== LCD DMA == | == LCD DMA == | ||
Line 16: | Line 28: | ||
|5+ | |5+ | ||
|Crash | |Crash | ||
− | | | + | |[[84PCE:Ports:1000|Parallel flash]]: Wait states are controlled by [[:84PCE:Ports:1005|1005]], adding to the minimum of 5. The OS sets a total of 9 wait states. |
+ | |- | ||
+ | |000000-3FFFFF | ||
+ | |1-200 | ||
+ | |Crash | ||
+ | |[[84PCE:OS:Serial_Flash_Commands|Serial flash:]] See above discussion. | ||
|- | |- | ||
|400000-7FFFFF | |400000-7FFFFF | ||
|257 | |257 | ||
|Crash | |Crash | ||
− | |Unmapped address space. Can be mapped to Flash using [[:84PCE:Ports:1002|1002]], after which Flash wait states are active. | + | |Parallel flash: Unmapped address space. Can be mapped to Flash using [[:84PCE:Ports:1002|1002]], after which Flash wait states are active. |
+ | |- | ||
+ | |400000-BFFFFF | ||
+ | |1-200 | ||
+ | |Crash | ||
+ | |Serial flash: Flash mirrors. Up to 12 MB of total flash can be mapped using [[:84PCE:Ports:1000|182E]]. | ||
|- | |- | ||
|800000-CFFFFF | |800000-CFFFFF | ||
|257 | |257 | ||
|257 | |257 | ||
− | |Unmapped address space. | + | |Parallel flash: Unmapped address space. |
+ | |- | ||
+ | |C00000-CFFFFF | ||
+ | |1 | ||
+ | |1 | ||
+ | |Serial flash: Unmapped address space. | ||
|- | |- | ||
|D00000-D3FFFF | |D00000-D3FFFF | ||
Line 38: | Line 65: | ||
|VRAM | |VRAM | ||
|- | |- | ||
− | |D65800- | + | |D65800-D72BFF |
|3 | |3 | ||
|1 | |1 | ||
|Unmapped address space. Reads garbage. | |Unmapped address space. Reads garbage. | ||
+ | |- | ||
+ | |D72C00-D7FFFF | ||
+ | |3 | ||
+ | |1 | ||
+ | |Unmapped address space. Reads garbage (revisions up to at least C) or mirror of D52C00-D5FFFF (at least I-S). | ||
|- | |- | ||
|D80000-DFFFFF | |D80000-DFFFFF | ||
Line 51: | Line 83: | ||
|1 | |1 | ||
|1 | |1 | ||
− | |Port range 0000 (mirrored every 0100 bytes) | + | |Port range [[:84PCE:Ports:0000|0000]] (mirrored every 0100 bytes) |
|- | |- | ||
|E00000-E0FFFF | |E00000-E0FFFF | ||
|1 | |1 | ||
|1 | |1 | ||
− | |Memory-mapped port range 1000 (mirrored every 0100 bytes) | + | |Memory-mapped port range [[:84PCE:Ports:1000|1000]] (mirrored every 0100 bytes) |
|- | |- | ||
|E10000-E1FFFF | |E10000-E1FFFF | ||
Line 84: | Line 116: | ||
|Memory-mapped port range [[:84PCE:Ports:3000|3000]] (mirrored every 0200 bytes) | |Memory-mapped port range [[:84PCE:Ports:3000|3000]] (mirrored every 0200 bytes) | ||
− | Extra cycles if bit 7 ''or'' 8 of the address is set, when cpu is running at | + | Extra cycles if bit 7 ''or'' 8 of the address is set, when cpu is running at 48 MHz. |
− | Same at | + | Same at 24 MHz. |
− | Same at | + | Same at 12 MHz. |
− | Same at | + | Same at 6 MHz. |
|- | |- | ||
|E30000-E3FFFF | |E30000-E3FFFF | ||
|2 | |2 | ||
+ | |||
+ | 2 | ||
+ | |||
+ | 2 | ||
+ | |||
+ | 2 | ||
+ | |||
+ | 2 | ||
+ | |||
+ | 3 | ||
+ | |||
+ | 1 | ||
+ | |||
+ | 1 | ||
+ | |||
+ | 1 | ||
+ | |||
+ | 1 | ||
|1 | |1 | ||
+ | |||
+ | 20-21/22 | ||
+ | |||
+ | 15/13 | ||
+ | |||
+ | 11 | ||
+ | |||
+ | 9 | ||
+ | |||
+ | 3 | ||
+ | |||
+ | 15-16/13 | ||
+ | |||
+ | 12/10 | ||
+ | |||
+ | 10/8 | ||
+ | |||
+ | 8 | ||
|Memory-mapped port range [[:84PCE:Ports:4000|4000]] (mirrored every 1000 bytes) | |Memory-mapped port range [[:84PCE:Ports:4000|4000]] (mirrored every 1000 bytes) | ||
+ | |||
+ | Special timing for 4000-400F & 4018-401B (pre-M/M+) at 48 MHz | ||
+ | |||
+ | Same at 24 MHz | ||
+ | |||
+ | Same at 12 MHz | ||
+ | |||
+ | Same at 6 MHz | ||
+ | |||
+ | Special timing for 4200-43FF (any CPU speed) | ||
+ | |||
+ | Special timing for 4C00-4DFF (pre-M/M+) at 48 MHz | ||
+ | |||
+ | Same at 24 MHz | ||
+ | |||
+ | Same at 12 MHz | ||
+ | |||
+ | Same at 6 MHz | ||
|- | |- | ||
|E40000-EFFFFF | |E40000-EFFFFF | ||
Line 125: | Line 211: | ||
|2 | |2 | ||
|2 | |2 | ||
− | |Memory-mapped port range 9000 (mirrored every 1000 bytes, possibly protected port range) | + | |Memory-mapped port range [[:84PCE:Ports:9000|9000]] (mirrored every 1000 bytes, possibly protected port range) |
|- | |- | ||
|F50000-F5FFFF | |F50000-F5FFFF | ||
Line 135: | Line 221: | ||
|2 | |2 | ||
|2 | |2 | ||
− | |Memory-mapped port range B000 (mirrored every 1000 bytes) | + | |Memory-mapped port range [[:84PCE:Ports:B000|B000]] (mirrored every 1000 bytes) |
|- | |- | ||
|F70000-F7FFFF | |F70000-F7FFFF | ||
|2 | |2 | ||
|2 | |2 | ||
− | |Memory-mapped port range C000 (mirrored every 0100 bytes) | + | |Memory-mapped port range [[:84PCE:Ports:C000|C000]] (mirrored every 0100 bytes) |
|- | |- | ||
|F80000-F8FFFF | |F80000-F8FFFF | ||
|2 | |2 | ||
|2 | |2 | ||
− | |Memory-mapped port range D000 (mirrored every 0080 bytes) | + | |Memory-mapped port range [[:84PCE:Ports:D000|D000]] (mirrored every 0080 bytes) |
|- | |- | ||
|F90000-F9FFFF | |F90000-F9FFFF | ||
|2 | |2 | ||
|2 | |2 | ||
− | |Memory-mapped port range E000 (mirrored every 0080 bytes) | + | |Memory-mapped port range [[:84PCE:Ports:E000|E000]] (mirrored every 0080 bytes) |
|- | |- | ||
|FA0000-FAFFFF | |FA0000-FAFFFF | ||
|2 | |2 | ||
|2 | |2 | ||
− | |Memory-mapped port range F000 ( | + | |Memory-mapped port range [[:84PCE:Ports:F000|F000]] (mirrored every 0100 bytes) |
|- | |- | ||
|FB0000-FEFFFF | |FB0000-FEFFFF |
Latest revision as of 11:12, 24 April 2022
Contents
Synopsis
The eZ80 processor is able to perform a memory access in a single cycle. However, on the TI-84+CE, accesses will actually take longer due to wait states. For example, a read from RAM will take 4 cycles, because it has 3 wait states. The wait states for parallel Flash accesses can be customized, but it is unknown whether that is the case for other memory regions or for the newer serial flash.
Flash Access Wait States
Parallel Flash
Calculators produced prior to revision M use a parallel flash chip. These chips are limited to a maximum read rate under 20 MHz, necessitating at least 1 wait state for the ASIC's faster 48 MHz clock. The ASIC's internal memory mapping hardware adds a minimum of 5 wait states just to ferry the request to the flash chip and the response back to the CPU, but additional external wait states are required for the flash chip to make its reply. See 84PCE:Ports:1005 for more information.
Serial Flash
Calculators produced starting in 2019 with revision M no longer use a parallel flash chip, but instead an SPI flash chip. This flash chip needs much longer to initially retrieve data from a random address, but can stream sequential data at a reasonable rate. A new ASIC design takes advantage of this capability by adding a cache between the flash chip and CPU.
According to tests performed by Jacobly, the cache is 8 K in size, structured as a 2-way set associative cache with 128 sets of 32 bytes chosen by address bits 5-11. A fetch from the same cache line as previously accessed costs 1 wait state (so the read takes a total of 2 cycles), a fetch from a different cache line costs 2 wait states, and a cache miss costs 194-200 wait states.
Code execution from flash displays high locality of reference, so amortized performance should be reasonably good, although the 200 cycle penalty for a cache miss probably hurts a fair bit. For flash-resident data files, programmers should try to organize them to keep related data together to maximize cache utilization. A good access pattern can make accessing data in flash twice as fast as getting it from RAM, while a bad pattern can make it substantially worse than even pulling it from the old parallel flash.
LCD DMA
The LCD controller uses Direct Memory Access to retrieve the pixels from RAM. However, since the CPU and the LCD controller cannot access RAM at the same time, there are some waitstates caused asynchronously by the DMA during RAM accesses. The rate of waitstates caused by the DMA appears to be directly proportional to the rate of data being sent to the screen, so lower bit-per-pixel modes will reduce the general performance hit.
Wait State Layout
Address Range | Read | Write | Description |
000000-3FFFFF | 5+ | Crash | Parallel flash: Wait states are controlled by 1005, adding to the minimum of 5. The OS sets a total of 9 wait states. |
000000-3FFFFF | 1-200 | Crash | Serial flash: See above discussion. |
400000-7FFFFF | 257 | Crash | Parallel flash: Unmapped address space. Can be mapped to Flash using 1002, after which Flash wait states are active. |
400000-BFFFFF | 1-200 | Crash | Serial flash: Flash mirrors. Up to 12 MB of total flash can be mapped using 182E. |
800000-CFFFFF | 257 | 257 | Parallel flash: Unmapped address space. |
C00000-CFFFFF | 1 | 1 | Serial flash: Unmapped address space. |
D00000-D3FFFF | 3 | 1 | RAM |
D40000-D657FF | 3 | 1 | VRAM |
D65800-D72BFF | 3 | 1 | Unmapped address space. Reads garbage. |
D72C00-D7FFFF | 3 | 1 | Unmapped address space. Reads garbage (revisions up to at least C) or mirror of D52C00-D5FFFF (at least I-S). |
D80000-DFFFFF | 3 | 1 | Mirror of D00000-D7FFFF |
Not mapped | 1 | 1 | Port range 0000 (mirrored every 0100 bytes) |
E00000-E0FFFF | 1 | 1 | Memory-mapped port range 1000 (mirrored every 0100 bytes) |
E10000-E1FFFF | 1 | 1 | Memory-mapped port range 2000 (mirrored every 0100 bytes) |
E20000-E2FFFF | 3
9-12 6-8 5-6 4-5 |
3
9-12 6-8 5-6 4-5 |
Memory-mapped port range 3000 (mirrored every 0200 bytes)
Extra cycles if bit 7 or 8 of the address is set, when cpu is running at 48 MHz. Same at 24 MHz. Same at 12 MHz. Same at 6 MHz. |
E30000-E3FFFF | 2
2 2 2 2 3 1 1 1 1 |
1
20-21/22 15/13 11 9 3 15-16/13 12/10 10/8 8 |
Memory-mapped port range 4000 (mirrored every 1000 bytes)
Special timing for 4000-400F & 4018-401B (pre-M/M+) at 48 MHz Same at 24 MHz Same at 12 MHz Same at 6 MHz Special timing for 4200-43FF (any CPU speed) Special timing for 4C00-4DFF (pre-M/M+) at 48 MHz Same at 24 MHz Same at 12 MHz Same at 6 MHz |
E40000-EFFFFF | 1 | 1 | Unmapped port range (reads all zeros) |
F00000-F0FFFF | 2 | 2 | Memory-mapped port range 5000 (mirrored every 0100 bytes) |
F10000-F1FFFF | 2 | 2 | Memory-mapped port range 6000 (mirrored every 0020 bytes) |
F20000-F2FFFF | 2 | 2 | Memory-mapped port range 7000 (mirrored every 0100 bytes) |
F30000-F3FFFF | 2 | 2 | Memory-mapped port range 8000 (mirrored every 0080 bytes) |
F40000-F4FFFF | 2 | 2 | Memory-mapped port range 9000 (mirrored every 1000 bytes, possibly protected port range) |
F50000-F5FFFF | 2 | 2 | Memory-mapped port range A000 (mirrored every 0080 bytes) |
F60000-F6FFFF | 2 | 2 | Memory-mapped port range B000 (mirrored every 1000 bytes) |
F70000-F7FFFF | 2 | 2 | Memory-mapped port range C000 (mirrored every 0100 bytes) |
F80000-F8FFFF | 2 | 2 | Memory-mapped port range D000 (mirrored every 0080 bytes) |
F90000-F9FFFF | 2 | 2 | Memory-mapped port range E000 (mirrored every 0080 bytes) |
FA0000-FAFFFF | 2 | 2 | Memory-mapped port range F000 (mirrored every 0100 bytes) |
FB0000-FEFFFF | 2 | 2 | Unmapped port range (reads all zeros) |
FF0000-FFFFFF | 1 | 1 | Unmapped port range (reads all zeros) |