<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="https://wikiti.brandonw.net/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>https://wikiti.brandonw.net/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Erfan+K</id>
		<title>WikiTI - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="https://wikiti.brandonw.net/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Erfan+K"/>
		<link rel="alternate" type="text/html" href="https://wikiti.brandonw.net/index.php?title=Special:Contributions/Erfan_K"/>
		<updated>2026-04-05T21:57:05Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.23.5</generator>

	<entry>
		<id>https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Square_root</id>
		<title>Z80 Routines:Math:Square root</title>
		<link rel="alternate" type="text/html" href="https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Square_root"/>
				<updated>2011-01-28T21:13:08Z</updated>
		
		<summary type="html">&lt;p&gt;Erfan K: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Z80 Routines:Math|Square root]]&lt;br /&gt;
[[Category:Z80 Routines|Square root]]&lt;br /&gt;
&lt;br /&gt;
==Size Optimization==&lt;br /&gt;
This version is size optimized, it compares every perfect square against HL until a square that is larger is found.  Obviously slower, but does get the job done in only 12 bytes.&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;;-------------------------------&lt;br /&gt;
;Square Root&lt;br /&gt;
;Inputs:&lt;br /&gt;
;HL = number to be square rooted&lt;br /&gt;
;Outputs:&lt;br /&gt;
;A  = square root&lt;br /&gt;
&lt;br /&gt;
sqrt:&lt;br /&gt;
   ld a,$ff&lt;br /&gt;
   ld de,1&lt;br /&gt;
sqrtloop:&lt;br /&gt;
   inc a&lt;br /&gt;
   dec e&lt;br /&gt;
   dec de&lt;br /&gt;
   add hl,de&lt;br /&gt;
   jr c,sqrtloop&lt;br /&gt;
   ret &amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Speed Optimization==&lt;br /&gt;
This version uses the high school method of finding a square root and so it is much faster, running at about ~850 tstates.  Unfortunately it requires 180 bytes and is quite obfuscated.&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;;-------------------------------&lt;br /&gt;
;Square Root&lt;br /&gt;
;Inputs:&lt;br /&gt;
;DE = number to be square rooted&lt;br /&gt;
;Outputs:&lt;br /&gt;
;A  = square root&lt;br /&gt;
&lt;br /&gt;
sqrt:&lt;br /&gt;
    xor a&lt;br /&gt;
    ld h,a&lt;br /&gt;
    ld l,a&lt;br /&gt;
    ld b,a&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    ld c,1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    add a,a&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    ld c,a&lt;br /&gt;
    scf&lt;br /&gt;
    rl c&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    add a,a&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    ld c,a&lt;br /&gt;
    scf&lt;br /&gt;
    rl c&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    add a,a&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    rl d&lt;br /&gt;
    rl l&lt;br /&gt;
    ld c,a&lt;br /&gt;
    scf&lt;br /&gt;
    rl c&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    add a,a&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    ld c,a&lt;br /&gt;
    scf&lt;br /&gt;
    rl c&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    add a,a&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    ld c,a&lt;br /&gt;
    scf&lt;br /&gt;
    rl c&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    add a,a&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    ld c,a&lt;br /&gt;
    scf&lt;br /&gt;
    rl c&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    add a,a&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    rl e&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    ld c,a&lt;br /&gt;
    scf&lt;br /&gt;
    rl c&lt;br /&gt;
    rl b&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jp c,$+3+2+1&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    inc a&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    ret&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Balanced Optimization==&lt;br /&gt;
This version is a balance between speed and size. It also uses the high school method and runs under 1200 tstates. It only costs 41 bytes.&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;;-------------------------------&lt;br /&gt;
;Square Root&lt;br /&gt;
;Inputs:&lt;br /&gt;
;DE = number to be square rooted&lt;br /&gt;
;Outputs:&lt;br /&gt;
;A  = square root&lt;br /&gt;
&lt;br /&gt;
Sqrt:&lt;br /&gt;
    ld hl,0&lt;br /&gt;
    ld c,l&lt;br /&gt;
    ld b,h&lt;br /&gt;
    ld a,8&lt;br /&gt;
Sqrtloop:&lt;br /&gt;
    sla e&lt;br /&gt;
    rl d&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    sla e&lt;br /&gt;
    rl d&lt;br /&gt;
    adc hl,hl&lt;br /&gt;
    scf               ;Can be optimised&lt;br /&gt;
    rl c              ;with SL1 instruction&lt;br /&gt;
    rl b&lt;br /&gt;
    sbc hl,bc&lt;br /&gt;
    jr nc,Sqrtaddbit&lt;br /&gt;
    add hl,bc&lt;br /&gt;
    dec c&lt;br /&gt;
Sqrtaddbit:&lt;br /&gt;
    inc c&lt;br /&gt;
    res 0,c&lt;br /&gt;
    dec a&lt;br /&gt;
    jr nz,Sqrtloop&lt;br /&gt;
    ld a,c&lt;br /&gt;
    rr b&lt;br /&gt;
    rra&lt;br /&gt;
    ret&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Presumably the best ==&lt;br /&gt;
&lt;br /&gt;
This code was found on z80 bits and has the advantage of being both faster than all three versions above and smaller than the last two (it runs in under 720 T-states (under 640 if fully unrolled) and takes a mere 29 bytes). On the other hand it takes a somewhat unconventionnal input... It computes the square root of the 16bit number formed by la and places the result in d.&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
sqrt_la:&lt;br /&gt;
	ld	de, 0040h	; 40h appends &amp;quot;01&amp;quot; to D&lt;br /&gt;
	ld	h, d&lt;br /&gt;
	&lt;br /&gt;
	ld	b, 7&lt;br /&gt;
	&lt;br /&gt;
	; need to clear the carry beforehand&lt;br /&gt;
	xor	a&lt;br /&gt;
	&lt;br /&gt;
_loop:&lt;br /&gt;
	sbc	hl, de&lt;br /&gt;
	jr	nc, $+3&lt;br /&gt;
	add	hl, de&lt;br /&gt;
	ccf&lt;br /&gt;
	rl	d&lt;br /&gt;
	rla&lt;br /&gt;
	adc	hl, hl&lt;br /&gt;
	rla&lt;br /&gt;
	adc	hl, hl&lt;br /&gt;
	&lt;br /&gt;
	djnz	_loop&lt;br /&gt;
	&lt;br /&gt;
	sbc	hl, de		; optimised last iteration&lt;br /&gt;
	ccf&lt;br /&gt;
	rl	d&lt;br /&gt;
	&lt;br /&gt;
	ret&lt;br /&gt;
 &amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Other Options==&lt;br /&gt;
A binary search of a square table would yield much better best case scenarios and the worst case scenarios would be similar to the high school method. However this would also require 512 byte table making it significantly larger than the other routines.  Of course the table could also serve as a rapid squaring method.&lt;br /&gt;
&lt;br /&gt;
== Credits and Contributions ==&lt;br /&gt;
* '''James Montelongo'''&lt;br /&gt;
* '''Milos &amp;quot;baze&amp;quot; Bazelides''' (or possibly one of the contributor of [http://baze.au.com/misc/z80bits.html z80bits])&lt;/div&gt;</summary>
		<author><name>Erfan K</name></author>	</entry>

	</feed>