"Fossies" - the Fresh Open Source Software Archive

Member "firefox-69.0.1/media/kiss_fft/README.simd" (17 Sep 2019, 2419 Bytes) of package /linux/www/firefox-69.0.1.source.tar.xz:


As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) C and C++ source code syntax highlighting (style: standard) with prefixed line numbers and code folding option. Alternatively you can here view or download the uninterpreted source code file.

    1 If you are reading this, it means you think you may be interested in using the SIMD extensions in kissfft 
    2 to do 4 *separate* FFTs at once.
    3 
    4 Beware! Beyond here there be dragons!
    5 
    6 This API is not easy to use, is not well documented, and breaks the KISS principle.  
    7 
    8 
    9 Still reading? Okay, you may get rewarded for your patience with a considerable speedup 
   10 (2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops.
   11 
   12 The basic idea is to use the packed 4 float __m128 data type as a scalar element.  
   13 This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D.
   14 
   15 For complex data, the data is interlaced as follows:
   16 rA0,rB0,rC0,rD0,      iA0,iB0,iC0,iD0,   rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ...
   17 where "rA0" is the real part of the zeroth sample for signal A
   18 
   19 Real-only data is laid out:
   20 rA0,rB0,rC0,rD0,     rA1,rB1,rC1,rD1,      ... 
   21 
   22 Compile with gcc flags something like
   23 -O3 -mpreferred-stack-boundary=4  -DUSE_SIMD=1 -msse 
   24 
   25 Be aware of SIMD alignment.  This is the most likely cause of segfaults.  
   26 The code within kissfft uses scratch variables on the stack.  
   27 With SIMD, these must have addresses on 16 byte boundaries.  
   28 Search on "SIMD alignment" for more info.
   29 
   30 
   31 
   32 Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft.  
   33 I have not run it -- use it at your own risk.  It appears to do 4xN and Nx4 transpositions 
   34 (out of place).
   35 
   36 void SSETools::pack128(float* target, float* source, unsigned long size128)
   37 {
   38    __m128* pDest = (__m128*)target;
   39    __m128* pDestEnd = pDest+size128;
   40    float* source0=source;
   41    float* source1=source0+size128;
   42    float* source2=source1+size128;
   43    float* source3=source2+size128;
   44 
   45    while(pDest<pDestEnd)
   46    {
   47        *pDest=_mm_set_ps(*source3,*source2,*source1,*source0);
   48        source0++;
   49        source1++;
   50        source2++;
   51        source3++;
   52        pDest++;
   53    }
   54 }
   55 
   56 void SSETools::unpack128(float* target, float* source, unsigned long size128)
   57 {
   58 
   59    float* pSrc = source;
   60    float* pSrcEnd = pSrc+size128*4;
   61    float* target0=target;
   62    float* target1=target0+size128;
   63    float* target2=target1+size128;
   64    float* target3=target2+size128;
   65 
   66    while(pSrc<pSrcEnd)
   67    {
   68        *target0=pSrc[0];
   69        *target1=pSrc[1];
   70        *target2=pSrc[2];
   71        *target3=pSrc[3];
   72        target0++;
   73        target1++;
   74        target2++;
   75        target3++;
   76        pSrc+=4;
   77    }
   78 }