bfloat16nn/libmodules
kaqu 9dc25b43d7 Working, but uses 2xcalclen? 2021-05-13 18:35:11 +02:00
..
README.md bfloat16 starter ... 2021-05-03 10:23:57 +02:00
bfloat16nncore.py Working, but uses 2xcalclen? 2021-05-13 18:35:11 +02:00
bfloat16processor.py Up & running! 2021-05-09 17:49:49 +02:00
dramtransfer.py Working, but uses 2xcalclen? 2021-05-13 18:35:11 +02:00
systime.py bfloat16 starter ... 2021-05-03 10:23:57 +02:00

README.md

LIBMODULES

File contents:

dramtransfer.py - contains main helpers for DRAM access.

bfloat16nncore.py - Neural network core processing

bfloat16processor.py - contains the bfloat16nn processing paths.

A note on the square root logic: I have implemented a variant of Goldschmidt's algorithm which allows for up to ⚠ 3.5% error, but there is simply no replacement for speed! If you need more accuracy, you will have to implement Newton-Raphson in s/w or perhaps doubles w/ external lib. calls. Example:

    // Newton-Raphson approximation (6 digits after decimal ok)        
    #define MAXITERATION 128
    #define ACCURRACY 1E-16

    float f = <value>; // Whatever you wanna calc.!
    float approx = 0.5 * f; // 1st approximation
    float betterapprox;            
    for(int i=0;i < MAXITERATION;i++) {
        betterapprox = 0.5 * (approx + f/approx);                
        if(f_abs(betterapprox - approx) < ACCURRACY)
            break;    
        approx = betterapprox;        
    }

systime.py - contains system time support