84PCE:Wait States

From WikiTI
Revision as of 11:47, 18 April 2020 by Dr. D'nar (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Synopsis

The eZ80 processor is able to perform a memory access in a single cycle. However, on the TI-84+CE, accesses will actually take longer due to wait states. For example, a read from RAM will take 4 cycles, because it has 3 wait states. The wait states for parallel Flash accesses can be customized, but it is unknown whether that is the case for other memory regions or for the newer serial flash.

Flash Access Wait States

Parallel Flash

Calculators produced prior to revision M use a parallel flash chip. These chips are limited to a maximum read rate under 20 MHz, necessitating at least 1 wait state for the ASIC's faster 48 MHz clock. The ASIC's internal memory mapping hardware adds a minimum of 5 wait states just to ferry the request to the flash chip and the response back to the CPU, but additional external wait states are required for the flash chip to make its reply. See 84PCE:Ports:1005 for more information.

Serial Flash

Calculators produced starting in 2019 with revision M no longer use a parallel flash chip, but instead an SPI flash chip. This flash chip needs much longer to initially retrieve data from a random address, but can stream sequential data at a reasonable rate. A new ASIC design takes advantage of this capability by adding a cache between the flash chip and CPU.

According to tests performed by Jacobly, the cache is 8 K in size, structured as a 2-way set associative cache with 128 sets of 32 bytes chosen by address bits 5-11. A fetch from the same cache line as previously accessed costs 1 wait state (so the read takes a total of 2 cycles), a fetch from a different cache line costs 2 wait states, and a cache miss costs 194-200 wait states.

Code execution from flash displays high locality of reference, so amortized performance should be reasonably good, although the 200 cycle penalty for a cache miss probably hurts a fair bit. For flash-resident data files, programmers should try to organize them to keep related data together to maximize cache utilization. A good access pattern can make accessing data in flash twice as fast as getting it from RAM, while a bad pattern can make it substantially worse than even pulling it from the old parallel flash.

LCD DMA

The LCD controller uses Direct Memory Access to retrieve the pixels from RAM. However, since the CPU and the LCD controller cannot access RAM at the same time, there are some waitstates caused asynchronously by the DMA during RAM accesses. The rate of waitstates caused by the DMA appears to be directly proportional to the rate of data being sent to the screen, so lower bit-per-pixel modes will reduce the general performance hit.

Wait State Layout

Address Range    Read    Write    Description   
000000-3FFFFF 5+ Crash Parallel flash: Wait states are controlled by 1005, adding to the minimum of 5. The OS sets a total of 9 wait states.
000000-3FFFFF 1-200 Crash Serial flash: See above discussion.
400000-7FFFFF 257 Crash Parallel flash: Unmapped address space. Can be mapped to Flash using 1002, after which Flash wait states are active.
400000-BFFFFF 1-200 Crash Serial flash: Flash mirrors. Appears to be subject to flash cache timing as discussed above
800000-CFFFFF 257 257 Parallel flash: Unmapped address space.
C00000-CFFFFF 1 1 Serial flash: Unmapped address space.
D00000-D3FFFF 3 1 RAM
D40000-D657FF 3 1 VRAM
D65800-D7FFFF 3 1 Unmapped address space. Reads garbage.
D80000-DFFFFF 3 1 Mirror of D00000-D7FFFF
Not mapped 1 1 Port range 0000 (mirrored every 0100 bytes)
E00000-E0FFFF 1 1 Memory-mapped port range 1000 (mirrored every 0100 bytes)
E10000-E1FFFF 1 1 Memory-mapped port range 2000 (mirrored every 0100 bytes)
E20000-E2FFFF 3

9-12

6-8

5-6

4-5

3

9-12

6-8

5-6

4-5

Memory-mapped port range 3000 (mirrored every 0200 bytes)

Extra cycles if bit 7 or 8 of the address is set, when cpu is running at 48Mhz.

Same at 24Mhz.

Same at 12Mhz.

Same at 6Mhz.

E30000-E3FFFF 2 1 Memory-mapped port range 4000 (mirrored every 1000 bytes)
E40000-EFFFFF 1 1 Unmapped port range (reads all zeros)
F00000-F0FFFF 2 2 Memory-mapped port range 5000 (mirrored every 0100 bytes)
F10000-F1FFFF 2 2 Memory-mapped port range 6000 (mirrored every 0020 bytes)
F20000-F2FFFF 2 2 Memory-mapped port range 7000 (mirrored every 0100 bytes)
F30000-F3FFFF 2 2 Memory-mapped port range 8000 (mirrored every 0080 bytes)
F40000-F4FFFF 2 2 Memory-mapped port range 9000 (mirrored every 1000 bytes, possibly protected port range)
F50000-F5FFFF 2 2 Memory-mapped port range A000 (mirrored every 0080 bytes)
F60000-F6FFFF 2 2 Memory-mapped port range B000 (mirrored every 1000 bytes)
F70000-F7FFFF 2 2 Memory-mapped port range C000 (mirrored every 0100 bytes)
F80000-F8FFFF 2 2 Memory-mapped port range D000 (mirrored every 0080 bytes)
F90000-F9FFFF 2 2 Memory-mapped port range E000 (mirrored every 0080 bytes)
FA0000-FAFFFF 2 2 Memory-mapped port range F000 (reads all zeros)
FB0000-FEFFFF 2 2 Unmapped port range (reads all zeros)
FF0000-FFFFFF 1 1 Unmapped port range (reads all zeros)