About

The purpose of this blog is to share interesting and valuable technical subjects I come across.


Thank you for reading,
Adnan Boz


bozblog@hotmail.com

  1. John Wallis
    March 11, 2012 at 3:24 pm

    Regading
    “How to use CPU instructions in C# to gain performance”

    The return _mm_testc_si128(a, b);
    line returned an exception…

    Do you have any idea what I did wrong?
    Could there be an alignment issue?

    • March 11, 2012 at 3:47 pm

      Did you include the smmintrin.h?
      What compiler and platform are you using?
      What is the error message?

  2. John Wallis
    March 11, 2012 at 4:06 pm

    I did, I believe as your example stated + VS 2010 Pro + Vista

    interopservices.SEHException

    • March 11, 2012 at 9:43 pm

      Can you send me your code?

  3. John Wallis
    March 11, 2012 at 10:59 pm

    #include
    #include

    #pragma unmanaged

    class SSE4_CPP
    {
    public:
    int PTEST( __int16* bufferA, __int16* bufferB)
    {
    __m128i a, b;

    //transfer the buffers to the _m128i data type, because we do not want to handle with that in managed code
    memcpy(a.m128i_i16, bufferA, sizeof(a.m128i_i16));
    memcpy(b.m128i_i16, bufferB, sizeof(b.m128i_i16));

    //Call the SSE4 PTEST instructions
    return _mm_testc_si128(a, b);
    }
    };

    #pragma managed
    // TestCPUInt.h

    #pragma once

    using namespace System;

    namespace ZhikuiGarage {

    public ref class SSE4
    {
    public:
    int PTestWPointer(__int16* pBufferA, __int16* pBufferB)
    {
    SSE4_CPP * sse4_cpp = new SSE4_CPP();
    int j =0;
    j=sse4_cpp->PTEST(pBufferA, pBufferB);
    return j;
    }
    };

    public ref class Class1
    {
    // TODO: Add your methods for this class here.
    };
    }
    //#endif

    //——
    using System;
    using System.Windows.Forms;
    using System.Diagnostics;

    namespace ZhikuiGarage
    {
    public partial class Form1 : Form
    {
    public Form1()
    {
    InitializeComponent();
    }

    void TESTSSE()
    {
    int testCount = 1;// 10000000;
    short[] buffer1 = new short[8];
    short[] buffer2 = new short[8];

    for (int i = 0; i < 8; i++)
    {
    buffer1[i] = 32100;
    buffer2[i] = 32100;
    }

    Stopwatch sw = new Stopwatch();
    int testResult = 0;
    #if true
    sw.Start();
    testResult = 0;
    for (int i = 0; i < testCount; i++)
    testResult = TestCPUWithPointer(buffer1, buffer2);
    sw.Stop();
    MessageBox.Show("SSE4 PTEST took " + sw.Elapsed.ToString() + " and returned {1} " + testResult.ToString());
    #endif
    sw.Start();
    for (int i = 0; i < testCount; i++)
    testResult = TestCLR(buffer1, buffer2);
    sw.Stop();
    MessageBox.Show("C# Test took " + sw.Elapsed.ToString() + " and returned {1} " + testResult.ToString());

    }

    private void Form1_Load(object sender, EventArgs e)
    {
    }

    static int TestCLR(short[] bufferA, short[] bufferB)
    {
    //We want to test if all bits set in bufferB are also set in bufferA
    for (int i = 0; i < bufferA.Length; i++)
    {
    if ((bufferA[i] & bufferB[i]) != bufferB[i])
    return 0;
    }
    return 1;
    }
    #if true
    static int TestCPUWithPointer(short[] bufferA, short[] bufferB)
    {
    SSE4 sse4 = new SSE4();
    unsafe
    {
    //fix the buffer variables in memory to prevent from getting moved by the garbage collector
    fixed (short* pBufferA = bufferA)
    {
    fixed (short* pBufferB = bufferB)
    {
    return sse4.PTestWPointer(pBufferA, pBufferB);
    }
    }
    }
    }

    private void button1_Click(object sender, System.EventArgs e)
    {
    TESTSSE();
    }
    #endif
    }
    }

    • March 12, 2012 at 9:49 pm

      Hi John,
      I’ve created the solution using your code with a console app and it is working fine. You can download it from this link. Please check it out. I hope this works, if not, it could mean that your CPU does not have those extensions. You can check out http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions to see if your CPU is in the list.
      Regards,
      Adnan

  4. John Wallis
    March 12, 2012 at 10:28 pm

    Thank You. I get a feeling the error I was getting was due to a c# decoration thing.
    I am somewhat new to c#: I have done the SSE2 with VC6 (extensions and asm).

    My aim is to get it to go with the fancy window’s stuff (and my current job likes c#).

    Thank You.

  5. John Wallis
    March 13, 2012 at 12:44 pm

    OK, Well that worked … I think I’m on my own now
    Thanks.

    By the way, the SSE4 version was slower. I will do other tests, etc for vector stuff.

    Thanks

    • March 13, 2012 at 9:56 pm

      I’m glad that you got it to work. Performance varies by hardware and implementation. The point is that you have a tool you can use in where it makes a difference.
      Another way is to use native all instead of the CLR class library. In that way you don’t have to deal with managed unmanaged code. Please check out one of my other posts for a native dll PInvoke. e.g.

      Best regards,
      Adnan

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: