Difference between revisions of "84PCE:Wait States"

From WikiTI
Jump to: navigation, search
(Add info about LCD DMA)
(Detailed timing for LCD controller)
(19 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[Category:84PCE:General_Hardware_Information|Wait States]]
 
[[Category:84PCE:General_Hardware_Information|Wait States]]
 
== Synopsis ==
 
== Synopsis ==
The eZ80 processor is able to perform a memory access in a single cycle. However, on the TI-84+CE, accesses will actually take longer due to wait states. For example, a read from RAM will take 4 cycles, because it has 3 wait states. The wait states for Flash accesses can be customized, but it is unknown whether that is the case for other memory regions.
+
The eZ80 processor is able to perform a memory access in a single cycle. However, on the TI-84+CE, accesses will actually take longer due to wait states. For example, a read from RAM will take 4 cycles, because it has 3 wait states. The wait states for parallel Flash accesses can be customized, but it is unknown whether that is the case for other memory regions or for the newer serial flash.
 +
 
 +
== Flash Access Wait States ==
 +
 
 +
=== Parallel Flash ===
 +
Calculators produced prior to revision M use a parallel flash chip. These chips are limited to a maximum read rate under 20 MHz, necessitating at least 1 wait state for the ASIC's faster 48 MHz clock. The ASIC's internal memory mapping hardware adds a minimum of 5 wait states just to ferry the request to the flash chip and the response back to the CPU, but additional external wait states are required for the flash chip to make its reply. See [[84PCE:Ports:1005]] for more information.
 +
 
 +
=== Serial Flash ===
 +
Calculators produced starting in 2019 with revision M no longer use a parallel flash chip, but instead an SPI flash chip. This flash chip needs much longer to initially retrieve data from a random address, but can stream sequential data at a reasonable rate. A new ASIC design takes advantage of this capability by adding a cache between the flash chip and CPU.
 +
 
 +
According to tests performed by [[User:Jacobly|Jacobly]], the cache is 8 K in size, structured as a 2-way set associative cache with 128 sets of 32 bytes chosen by address bits 5-11. A fetch from the same cache line as previously accessed costs 1 wait state (so the read takes a total of 2 cycles), a fetch from a different cache line costs 2 wait states, and a cache miss costs 194-200 wait states.
 +
 
 +
Code execution from flash displays high locality of reference, so amortized performance should be reasonably good, although the 200 cycle penalty for a cache miss probably hurts a fair bit. For flash-resident data files, programmers should try to organize them to keep related data together to maximize cache utilization. A good access pattern can make accessing data in flash twice as fast as getting it from RAM, while a bad pattern can make it substantially worse than even pulling it from the old parallel flash.
  
 
== LCD DMA ==
 
== LCD DMA ==
The LCD controller uses Direct Memory Access to retrieve the pixels from memory. However, since the CPU and the LCD controller cannot access memory at the same time, there are some waitstates caused asynchronously by the DMA. The rate of waitstates caused by the DMA appears to be directly proportional to the rate of data being sent to the screen, so lower bit-per-pixel modes will reduce the general performance hit.
+
The LCD controller uses Direct Memory Access to retrieve the pixels from RAM. However, since the CPU and the LCD controller cannot access RAM at the same time, there are some waitstates caused asynchronously by the DMA during RAM accesses. The rate of waitstates caused by the DMA appears to be directly proportional to the rate of data being sent to the screen, so lower bit-per-pixel modes will reduce the general performance hit.
  
 
== Wait State Layout ==
 
== Wait State Layout ==
Line 16: Line 28:
 
|5+
 
|5+
 
|Crash
 
|Crash
|Flash wait states are controlled by [[:84PCE:Ports:1005|1005]], adding to the minimum of 5.
+
|[[84PCE:Ports:1000|Parallel flash]]: Wait states are controlled by [[:84PCE:Ports:1005|1005]], adding to the minimum of 5. The OS sets a total of 9 wait states.
 +
|-
 +
|000000-3FFFFF
 +
|1-200
 +
|Crash
 +
|[[84PCE:OS:Serial_Flash_Commands|Serial flash:]] See above discussion.
 
|-
 
|-
 
|400000-7FFFFF
 
|400000-7FFFFF
 
|257
 
|257
 
|Crash
 
|Crash
|Unmapped address space. Can be mapped to Flash using [[:84PCE:Ports:1002|1002]], after which Flash wait states are active.
+
|Parallel flash: Unmapped address space. Can be mapped to Flash using [[:84PCE:Ports:1002|1002]], after which Flash wait states are active.
 +
|-
 +
|400000-BFFFFF
 +
|1-200
 +
|Crash
 +
|Serial flash: Flash mirrors. Appears to be subject to flash cache timing as discussed above
 
|-
 
|-
 
|800000-CFFFFF
 
|800000-CFFFFF
 
|257
 
|257
 
|257
 
|257
|Unmapped address space.
+
|Parallel flash: Unmapped address space.
 +
|-
 +
|C00000-CFFFFF
 +
|1
 +
|1
 +
|Serial flash: Unmapped address space.
 
|-
 
|-
 
|D00000-D3FFFF
 
|D00000-D3FFFF
Line 38: Line 65:
 
|VRAM
 
|VRAM
 
|-
 
|-
|D65800-D7FFFF
+
|D65800-D72BFF
 
|3
 
|3
 
|1
 
|1
 
|Unmapped address space. Reads garbage.
 
|Unmapped address space. Reads garbage.
 +
|-
 +
|D72C00-D7FFFF
 +
|3
 +
|1
 +
|Unmapped address space. Reads garbage (revisions up to at least C) or mirror of D52C00-D5FFFF (at least I+).
 
|-
 
|-
 
|D80000-DFFFFF
 
|D80000-DFFFFF
 
|3
 
|3
 
|1
 
|1
|Mirror of D00000-D3FFFF
+
|Mirror of D00000-D7FFFF
 +
|-
 +
|Not mapped
 +
|1
 +
|1
 +
|Port range [[:84PCE:Ports:0000|0000]] (mirrored every 0100 bytes)
 
|-
 
|-
 
|E00000-E0FFFF
 
|E00000-E0FFFF
 
|1
 
|1
 
|1
 
|1
|Memory-mapped port range 1000 (mirrored every 0100 bytes)
+
|Memory-mapped port range [[:84PCE:Ports:1000|1000]] (mirrored every 0100 bytes)
 
|-
 
|-
 
|E10000-E1FFFF
 
|E10000-E1FFFF
 
|1
 
|1
 
|1
 
|1
|Memory-mapped port range 2000 (reads all zeros)
+
|Memory-mapped port range [[:84PCE:Ports:2000|2000]] (mirrored every 0100 bytes)
 
|-
 
|-
 
|E20000-E2FFFF
 
|E20000-E2FFFF
 
|3
 
|3
 +
 +
9-12
 +
 +
6-8
 +
 +
5-6
 +
 +
4-5
 
|3
 
|3
 +
 +
9-12
 +
 +
6-8
 +
 +
5-6
 +
 +
4-5
 
|Memory-mapped port range [[:84PCE:Ports:3000|3000]] (mirrored every 0200 bytes)
 
|Memory-mapped port range [[:84PCE:Ports:3000|3000]] (mirrored every 0200 bytes)
 +
 +
Extra cycles if bit 7 ''or'' 8 of the address is set, when cpu is running at 48 MHz.
 +
 +
Same at 24 MHz.
 +
 +
Same at 12 MHz.
 +
 +
Same at 6 MHz.
 
|-
 
|-
 
|E30000-E3FFFF
 
|E30000-E3FFFF
 
|2
 
|2
 +
 +
2
 +
 +
2
 +
 +
2
 +
 +
2
 +
 +
3
 +
 +
1
 +
 +
1
 +
 +
1
 +
 +
1
 
|1
 
|1
|Memory-mapped port range [[:84PCE:Ports:4000|4000]] (not mirrored, one contiguous virtual address space)
+
 
 +
20-21/22
 +
 
 +
15/13
 +
 
 +
11
 +
 
 +
9
 +
 
 +
3
 +
 
 +
15-16/13
 +
 
 +
12/10
 +
 
 +
10/8
 +
 
 +
8
 +
|Memory-mapped port range [[:84PCE:Ports:4000|4000]] (mirrored every 1000 bytes)
 +
 
 +
Special timing for 4000-400F & 4018-401B (pre-M/M+) at 48 MHz
 +
 
 +
Same at 24 MHz
 +
 
 +
Same at 12 MHz
 +
 
 +
Same at 6 MHz
 +
 
 +
Special timing for 4200-43FF (any CPU speed)
 +
 
 +
Special timing for 4C00-4DFF (pre-M/M+) at 48 MHz
 +
 
 +
Same at 24 MHz
 +
 
 +
Same at 12 MHz
 +
 
 +
Same at 6 MHz
 
|-
 
|-
 
|E40000-EFFFFF
 
|E40000-EFFFFF
Line 96: Line 211:
 
|2
 
|2
 
|2
 
|2
|Memory-mapped port range 9000 (not mirrored, possibly protected port range)
+
|Memory-mapped port range [[:84PCE:Ports:9000|9000]] (mirrored every 1000 bytes, possibly protected port range)
 
|-
 
|-
 
|F50000-F5FFFF
 
|F50000-F5FFFF
Line 106: Line 221:
 
|2
 
|2
 
|2
 
|2
|Memory-mapped port range B000 (not mirrored, all zeros after around F60040)
+
|Memory-mapped port range [[:84PCE:Ports:B000|B000]] (mirrored every 1000 bytes)
 
|-
 
|-
 
|F70000-F7FFFF
 
|F70000-F7FFFF
 
|2
 
|2
 
|2
 
|2
|Memory-mapped port range C000 (mirrored every 0100 bytes)
+
|Memory-mapped port range [[:84PCE:Ports:C000|C000]] (mirrored every 0100 bytes)
 
|-
 
|-
 
|F80000-F8FFFF
 
|F80000-F8FFFF
 
|2
 
|2
 
|2
 
|2
|Memory-mapped port range D000 (mirrored every 0080 bytes)
+
|Memory-mapped port range [[:84PCE:Ports:D000|D000]] (mirrored every 0080 bytes)
 
|-
 
|-
 
|F90000-F9FFFF
 
|F90000-F9FFFF
 
|2
 
|2
 
|2
 
|2
|Memory-mapped port range E000 (mirrored every 0080 bytes)
+
|Memory-mapped port range [[:84PCE:Ports:E000|E000]] (mirrored every 0080 bytes)
 
|-
 
|-
 
|FA0000-FAFFFF
 
|FA0000-FAFFFF
 
|2
 
|2
 
|2
 
|2
|Memory-mapped port range F000 (reads all zeros)
+
|Memory-mapped port range [[:84PCE:Ports:F000|F000]] (mirrored every 0100 bytes)
 
|-
 
|-
 
|FB0000-FEFFFF
 
|FB0000-FEFFFF

Revision as of 23:51, 22 February 2021

Synopsis

The eZ80 processor is able to perform a memory access in a single cycle. However, on the TI-84+CE, accesses will actually take longer due to wait states. For example, a read from RAM will take 4 cycles, because it has 3 wait states. The wait states for parallel Flash accesses can be customized, but it is unknown whether that is the case for other memory regions or for the newer serial flash.

Flash Access Wait States

Parallel Flash

Calculators produced prior to revision M use a parallel flash chip. These chips are limited to a maximum read rate under 20 MHz, necessitating at least 1 wait state for the ASIC's faster 48 MHz clock. The ASIC's internal memory mapping hardware adds a minimum of 5 wait states just to ferry the request to the flash chip and the response back to the CPU, but additional external wait states are required for the flash chip to make its reply. See 84PCE:Ports:1005 for more information.

Serial Flash

Calculators produced starting in 2019 with revision M no longer use a parallel flash chip, but instead an SPI flash chip. This flash chip needs much longer to initially retrieve data from a random address, but can stream sequential data at a reasonable rate. A new ASIC design takes advantage of this capability by adding a cache between the flash chip and CPU.

According to tests performed by Jacobly, the cache is 8 K in size, structured as a 2-way set associative cache with 128 sets of 32 bytes chosen by address bits 5-11. A fetch from the same cache line as previously accessed costs 1 wait state (so the read takes a total of 2 cycles), a fetch from a different cache line costs 2 wait states, and a cache miss costs 194-200 wait states.

Code execution from flash displays high locality of reference, so amortized performance should be reasonably good, although the 200 cycle penalty for a cache miss probably hurts a fair bit. For flash-resident data files, programmers should try to organize them to keep related data together to maximize cache utilization. A good access pattern can make accessing data in flash twice as fast as getting it from RAM, while a bad pattern can make it substantially worse than even pulling it from the old parallel flash.

LCD DMA

The LCD controller uses Direct Memory Access to retrieve the pixels from RAM. However, since the CPU and the LCD controller cannot access RAM at the same time, there are some waitstates caused asynchronously by the DMA during RAM accesses. The rate of waitstates caused by the DMA appears to be directly proportional to the rate of data being sent to the screen, so lower bit-per-pixel modes will reduce the general performance hit.

Wait State Layout

Address Range    Read    Write    Description   
000000-3FFFFF 5+ Crash Parallel flash: Wait states are controlled by 1005, adding to the minimum of 5. The OS sets a total of 9 wait states.
000000-3FFFFF 1-200 Crash Serial flash: See above discussion.
400000-7FFFFF 257 Crash Parallel flash: Unmapped address space. Can be mapped to Flash using 1002, after which Flash wait states are active.
400000-BFFFFF 1-200 Crash Serial flash: Flash mirrors. Appears to be subject to flash cache timing as discussed above
800000-CFFFFF 257 257 Parallel flash: Unmapped address space.
C00000-CFFFFF 1 1 Serial flash: Unmapped address space.
D00000-D3FFFF 3 1 RAM
D40000-D657FF 3 1 VRAM
D65800-D72BFF 3 1 Unmapped address space. Reads garbage.
D72C00-D7FFFF 3 1 Unmapped address space. Reads garbage (revisions up to at least C) or mirror of D52C00-D5FFFF (at least I+).
D80000-DFFFFF 3 1 Mirror of D00000-D7FFFF
Not mapped 1 1 Port range 0000 (mirrored every 0100 bytes)
E00000-E0FFFF 1 1 Memory-mapped port range 1000 (mirrored every 0100 bytes)
E10000-E1FFFF 1 1 Memory-mapped port range 2000 (mirrored every 0100 bytes)
E20000-E2FFFF 3

9-12

6-8

5-6

4-5

3

9-12

6-8

5-6

4-5

Memory-mapped port range 3000 (mirrored every 0200 bytes)

Extra cycles if bit 7 or 8 of the address is set, when cpu is running at 48 MHz.

Same at 24 MHz.

Same at 12 MHz.

Same at 6 MHz.

E30000-E3FFFF 2

2

2

2

2

3

1

1

1

1

1

20-21/22

15/13

11

9

3

15-16/13

12/10

10/8

8

Memory-mapped port range 4000 (mirrored every 1000 bytes)

Special timing for 4000-400F & 4018-401B (pre-M/M+) at 48 MHz

Same at 24 MHz

Same at 12 MHz

Same at 6 MHz

Special timing for 4200-43FF (any CPU speed)

Special timing for 4C00-4DFF (pre-M/M+) at 48 MHz

Same at 24 MHz

Same at 12 MHz

Same at 6 MHz

E40000-EFFFFF 1 1 Unmapped port range (reads all zeros)
F00000-F0FFFF 2 2 Memory-mapped port range 5000 (mirrored every 0100 bytes)
F10000-F1FFFF 2 2 Memory-mapped port range 6000 (mirrored every 0020 bytes)
F20000-F2FFFF 2 2 Memory-mapped port range 7000 (mirrored every 0100 bytes)
F30000-F3FFFF 2 2 Memory-mapped port range 8000 (mirrored every 0080 bytes)
F40000-F4FFFF 2 2 Memory-mapped port range 9000 (mirrored every 1000 bytes, possibly protected port range)
F50000-F5FFFF 2 2 Memory-mapped port range A000 (mirrored every 0080 bytes)
F60000-F6FFFF 2 2 Memory-mapped port range B000 (mirrored every 1000 bytes)
F70000-F7FFFF 2 2 Memory-mapped port range C000 (mirrored every 0100 bytes)
F80000-F8FFFF 2 2 Memory-mapped port range D000 (mirrored every 0080 bytes)
F90000-F9FFFF 2 2 Memory-mapped port range E000 (mirrored every 0080 bytes)
FA0000-FAFFFF 2 2 Memory-mapped port range F000 (mirrored every 0100 bytes)
FB0000-FEFFFF 2 2 Unmapped port range (reads all zeros)
FF0000-FFFFFF 1 1 Unmapped port range (reads all zeros)