You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
2 years ago | |
---|---|---|
.. | ||
README.md | 2 years ago | |
bfloat16nncore.py | 2 years ago | |
bfloat16processor.py | 2 years ago | |
dramtransfer.py | 2 years ago | |
systime.py | 2 years ago |
README.md
LIBMODULES
File contents:
dramtransfer.py - contains main helpers for DRAM access.
bfloat16nncore.py - Neural network core processing
bfloat16processor.py - contains the bfloat16nn processing paths.
A note on the square root logic: I have implemented a variant of Goldschmidt's algorithm which allows for up to ⚠ 3.5% error, but there is simply no replacement for speed! If you need more accuracy, you will have to implement Newton-Raphson in s/w or perhaps doubles w/ external lib. calls. Example:
// Newton-Raphson approximation (6 digits after decimal ok)
#define MAXITERATION 128
#define ACCURRACY 1E-16
float f = <value>; // Whatever you wanna calc.!
float approx = 0.5 * f; // 1st approximation
float betterapprox;
for(int i=0;i < MAXITERATION;i++) {
betterapprox = 0.5 * (approx + f/approx);
if(f_abs(betterapprox - approx) < ACCURRACY)
break;
approx = betterapprox;
}
systime.py - contains system time support