|
||
---|---|---|
.. | ||
README.md | ||
bfloat16nncore.py | ||
bfloat16processor.py | ||
dramtransfer.py | ||
systime.py |
README.md
LIBMODULES
File contents:
dramtransfer.py - contains main helpers for DRAM access.
bfloat16nncore.py - Neural network core processing
bfloat16processor.py - contains the bfloat16nn processing paths.
A note on the square root logic: I have implemented a variant of Goldschmidt's algorithm which allows for up to ⚠ 3.5% error, but there is simply no replacement for speed! If you need more accuracy, you will have to implement Newton-Raphson in s/w or perhaps doubles w/ external lib. calls. Example:
// Newton-Raphson approximation (6 digits after decimal ok)
#define MAXITERATION 128
#define ACCURRACY 1E-16
float f = <value>; // Whatever you wanna calc.!
float approx = 0.5 * f; // 1st approximation
float betterapprox;
for(int i=0;i < MAXITERATION;i++) {
betterapprox = 0.5 * (approx + f/approx);
if(f_abs(betterapprox - approx) < ACCURRACY)
break;
approx = betterapprox;
}
systime.py - contains system time support