C# language, C++, CodeProject, GPGPU, Technical

Faster Sorting in C# by Utilizing GPU with NVIDIA CUDA


Introduction

On this post I would like to give an entry level example how you can use NVIDIA CUDA technology to achieve better performance within C# with minimum possible amount of code. This is a very specific scenario where the main application is in C#, and the sorting algorithm is provided in C++ as an API and you have an NVIDIA CUDA capable graphics card. I hope this basic sample will give you some insight and encourage you to invest into parallel technologies.

Background

Parallel Computing
For more than two decades CPUs drove rapid performance increase and price reductions. With that, we may get used to having speed boost of our code running on a new piece of hardware. As you may recall, this relentless performance improvement is described with two words: Moore’s law. While the continuation of Moore’s law is debated on many platforms, because of the slow down starting at 2003, many of us already jumped to the parallel bandwagon using systems based on SIMD or MIMD models. With parallel computing, application logic suitable for parallelization will most likely run faster without waiting for tomorrow’s hardware. (It is “most likely” because there are theoretical limits explained by Amdahl’s law.)

GPGPU
A GPU is a massively multithreaded processor which supports thousands of concurrent threads. Using GPU to do general purpose scientific and engineering computing is named GPGPU. NVIDIA is one of the leading GPU manufacturers and they are providing fully programmable GPU technology for high-level languages like C, C++ and Fortran with the architecture named CUDA. The NVIDIA hardware uses the SIMT (single-instruction multiple-thread) execution model rather than the popular SIMD model. I gave a SIMD example in one my previous posts, how to utilize an SSE4 CPU instruction within C#. And in this post we will be using SIMT execution model through NVIDIA CUDA. (Under the cover, on SIMD multiple data elements for a single instruction are collected and packed into a single register. On the other hand on SIMT, all threads process data in their own registers. )

Thrust
Writing code using CUDA API is very powerful in terms of controlling the hardware, but it requires to handle memory and execution which is outside the scope of this post. Therefore I’ll use a high-level programming interface named Thrust. As describe at http://code.google.com/p/thrust/, Thrust is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Thrust library provides many fundamental programming logic like sorting, prefix-sums, reductions, transformations etc..

Thrust consists of header files and is distributed with CUDA 4.0 installation. Therefore, you don’t have to install it additionally.

Radix Sort
The reason I choose sorting is that it is a widely used fundamental computational building block to many problems. Radix sort is one of the oldest algorithms and is very efficient for sorting small keys sequentially with the algorithmic complexity of O(n). One of the two sorting algorithms Thrust provides is Radix Sort, therefore thrust fits our example very well. There is a paper about Designing Efficient Sorting Algorithms for Manycore GPUs, which explains how the algorithm works on the GPU.

Setup

We will be using Visual Studio 2010 to host one C# application and one C++ library. The only additional API is the NVIDIA CUDA Toolkit 4.0. You will need to install the CUDA Toolkit and a supported graphics device driver from the same link.
Also do not forget to install the Build Customization BUG FIX Update from the same link or from here.

Implementation

The solution consist of one C# console application and a C++ native dll. The C++ native dll exports a method which has the code to run the sort algorithm on the GPU. The C# code uses platform invoke to call the exported method of the C++ dll.

For comparison purpose I’m also providing a C# implementation of Radix Sort. The C# implementation takes an array of 32 bit integers and process it in three steps for every 8 bit part:
1) Creating histogram of data,
2) Running prefix sum(scan),
3) Adding the data to bin.

On some implementations the histogram is named as count or difference, but here I used the name histogram, because there is very good CUDA implementation I may write about in a future post. It is basically finding out how many of every byte is in the array. Because the max byte value is 0xFF, therefore the size of the histogram is 256, reflecting the maximum number of different values.

The second step is to run prefix sum on the histogram, which is simply adding one value to the right neighbor in the array. In that way every element in the array will contain it’s left neighbor. Actually, there is a very efficient way to to prefix sum on parallel environment explained by the paper Efficient Parallel Scan Algorithms for GPUs. Which you may find usefull for understanding the logic behind the GPU implementation.

The last step is to place the value into the correct bin by using the prefix sum result at the value index to determine the temporary array index.

The code is as follows:

public static void CPURadixSort(int[] sourceData)
{
 int size = sourceData.Length;
 //temporary array for every byte iteration
 int[] tempData = new int[size]; 
 //histogram of the last byte 
 int[] histogram = new int[256]; 
 //The prefix sum of the histogram
 int[] prefixSum = new int[256]; 
 			
 unsafe
 {
  fixed (int* pTempData = tempData)
  {
   fixed (int* pSourceData = sourceData)
   {
    int* pTemp = pTempData;
    int* pBck;
    int* pSource = pSourceData;
     
    //Loop through every byte of 4 byte integer
    for (int byteIdx = 0; byteIdx < 4; byteIdx++)
    {
     	int shift = byteIdx * 8; //total bits to shift the numbers
     
     	//Calculate histogram of the last byte of the data
     	for (int i = 0; i < size; i++)
     	 histogram[(pSource[i] >> shift) & 0xFF]++;
     
     	//Calculate prefix-sum of the histogram
     	prefixSum[0] = 0;
     	for (int i = 1; i < 256; i++)
     	 prefixSum[i] = prefixSum[i - 1] + histogram[i - 1];
     
     	//Get the prefix-sum array index of the last byte, increase 
        //it by one. That gives us the the index we want to place 
        //the data
     	for (int i = 0; i < size; i++)
     	 pTemp[prefixSum[(pSource[i] >> shift) & 0xFF]++] = pSource[i];
     
     	//Swap the pointers
     	pBck = pSource;
     	pSource = pTemp;
     	pTemp = pBck;
     
     	//reset the histogram
     	for (int i = 0; i < 256; i++)
     	 histogram[i] = 0;
    }
   }
  }
 }
}

One more detail about the code is that I used unsafe code in order to swap the pointers for the temporary buffer and source data to gain another half second. But, it is possible to remove that part and use the tempData.CopyTo(sourceData, 0) instead.

When I run the code above on an Intel Xeon 2.40 Ghz CPU for 2^25 (33,554,432) random integers, it executed in 3.038 seconds.

Next step is to create the sorting code on the GPU and call it within our C# code. For this we need to create a DLL in C++ by using the default .Net project wizard. Please follow the steps to acomplish this step:
Step 1) Create a Visual C++ console project named “RadixSortGPU.dll” in Visual Studio 2010 by selecting DLL and Empty project on the wizard,

Step 2) Open Build Customization from the C++ project context menu, and ceck the CUDA 4.0(.targets, .props) checkbox,

Step 3) Open the project properties, expand the Configuration Properties, Linker and select Input. Edit the additional dependencies and add cudart.lib.

Step 4) Add a new empty source file ending with .cu and paste the following code into it:

#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/copy.h>
#include <thrust/detail/type_traits.h>

extern "C" __declspec(dllexport) void __cdecl GPURadixSort(int*, unsigned int);

extern void GPURadixSort(int* data, unsigned int numElements)
{
	thrust::device_vector<int> d_data(data, data + numElements);

	thrust::stable_sort(d_data.begin(), d_data.end());

	thrust::copy(d_data.begin(), d_data.end(), data);
}

This code is the only program logic you need to sort a given integer array on the GPU. For the sake of this example I did not include any exception handling and validation. But for any production code you most probably want to add exception handling and validation logic.

As you can see in the code, starting using Thrust is very simple by just including the required headers.

The dllexport definition makes the GPURadixSort usable from within another library. In this way we will be able to call this method from our C# library.

The GPURadixSort method itself is simply copying the data array into a container named thrust::device_vector, resulting the data to be placed into the GPU global memory, sorting the data on the GPU and copying back the data from the GPU memory to our input array. and
the begin() and end() methods are iterators pointing to the data at the beginning or end of the array. For more information how to use vectors and iterators please visit the Thrust Quick Start Guide.

Step 5) I’m assuming you already have a C# project you can paste the following code into your class,

[DllImport("RadixSortGPU.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern void GPURadixSort(
 [MarshalAsAttribute(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I4)]
 int[] data, uint numElements);


This code provides the information needed to call the GPURadixSort function exported from our native DLL.

Step 6) That’s it. Now you can call the GPURadixSort within your C# code to sort your array.

Based on my environment with a NVIDIA Quadro 600 GPU, the sort takes about 1.211 seconds, which is almost 3x faster than the CPU version. The NVIDIA Quadro 600 GPU I used is a low profile graphics processor which cost about $150. 96 cores, 1GB memory and 25.6 GB/s memory bandwidth maybe sound powerful, but it is low profile compared to GPUs like Quadro 6000, with 448 cores, 6 GB memory and 144 GB/s memory bandwidth. There are also the Tesla family GPUs for high performance computing needs you may want to check out at NVIDIA.

Beyond
Below I provided some code to check if the GPU memory is enough to hold the given array, which is one of the important validations you want to perform if you target heterogeneous environments.

using namespace std;
...
extern void GPURadixSort(int* data, unsigned int numElements)
{
 int deviceID = -1;
 if (cudaSuccess == cudaGetDevice(&deviceID))
 {
     cudaDeviceProp devprop;
     cudaGetDeviceProperties(&devprop, deviceID);
     unsigned int totalMem = numElements * sizeof(int);
     if (devprop.totalGlobalMem < totalMem)
     {
        //Handle error
     }
 }
 ...
}

As I mentioned before, under the high-level Thrust library the powerful CUDA API resides. If Thrust does not have the functionality you are looking for, you can always use CUDA directly. It is not as easy as using Thrust, but if you will follow the resources below, it will not take you much to start using this great technology.

Conclusion

We are using parallelism mostly to gain performance when a program runs slow or we need to handle huge amount of calculation. But, I believe that parallelism is a very important software aspect allowing us to use the underlying hardware wisely and gain advantage in this competitive market, where two functionality wise similar applications can only compete on software qualities like performance. There are many parallel technologies which will allow you to achieve higher performance with the same hardware. One of them is definitely NVIDIA CUDA, and some others you want to look into are TPL, PPL, C++ AMP, CUDA, OpenCL, MPI.

Advertisements
Standard
C# language, C++, CodeProject, Technical

How to use CPU instructions in C# to gain performance


Introduction

Today, .NetFramework and C# became very common for developing even the most complex application very easily. I remember that before getting our hands-on on C# in 2002 we were using all kind of programming languages for different purposes ranging from Assembly, C++ to PowerBuilder. But, I also remember the power of using assembly or C++ to use every small drop of power of your hardware resources. Every once in a while I’m getting to a project where using regular framework functionality puts my computer to grill for couple days to calculate something. In those cases I go back to the good old C++ or Assembly routines to use the power of my computer. On this blog I will be showing you the simplest way to take advantage of your hardware without introducing any code complexity.

Background

I believe that samples are the best teacher; therefore, I’ll be using a sample CPU instruction from Streaming SIMD Extensions (SSE). SSE is just one of the many instruction set extension to X86 architecture. I’ll be using an instruction named PTEST from the SSE4 instructions, which is almost in all Intel and AMD CPUs today. You can visit the links above for supported CPUs. PTEST helps us to perform bitwise comparison between two 128-bit parameters. I picked this because it was a good sample using also data structures. You can easily lookup online for any other instruction set for your project requirements.

We will be using unmanaged C++, wrap it with managed C++ and call it from C#. Don’t worry, it is easier than it sounds.
Thanks to MSDN we will have all necessary information at Alphabetical Listing of Intrinsic Functions. The PTEST _mm_testc_si128 is documented at http://msdn.microsoft.com/en-us/library/bb513983.aspx. If we would use C++ we would end up having the code from the MSDN link:

#include <stdio.h>
#include <smmintrin.h>

int main ()
{
    __m128i a, b;

    a.m128i_u64[0] = 0xAAAA55551111FFFF;
    b.m128i_u64[0] = 0xAAAA55551111FFFF;

    a.m128i_u64[1] = 0xFEDCBA9876543210;
    b.m128i_u64[1] = 0xFEDCBA9876543210;

    int res1 = _mm_testc_si128(a, b);

    a.m128i_u64[0] = 0xAAAA55551011FFFF;

    int res2 = _mm_testc_si128(a, b);

    printf_s("First result should be 1: %d\nSecond result should be 0: %d\n",
                res1, res2);

    return 0;
}

I would like to point out that there are many ways to develop a software and the one I’m providing here is maybe not the best solution for your requirement. I’m just providing one way to accomplish a task, it depends to you to fit into your solution.

Ok, let’s start with a fresh new solution: ( I’m assumig you have Visual Studio 2010 with C++ language installed on it.)

1 ) Add a new C# console application to your solution, for the testing purpose
2 ) Add a new Visual C++ > CLR > Class Library project to your solution, named TestCPUInt
3 ) Add a reference to your console application and select TestCPUInt from the projects

Now we are ready to code.
4 ) Open the TestCPUInt.h file, if it is already not opened
5 ) Insert the following code right on top of the document

#include <smmintrin.h>
#include <memory.h>

#pragma unmanaged

class SSE4_CPP
{
//Code here
};

#pragma managed

This is the infrastructure for our unmanaged C++ code which will do the SSE4 call. As you can see I placed our unmanaged code between #pragma unmanaged and #pragme managed. I think it is a great feature to be able to write unmanaged and managed code together. You may prefer to place the unmanaged code to another file, but I used one file for clarity. We used here two include header files, smmintrin.h and memory.h: the first one is for the SSE4 instructions and the other one is for a method I used to copy memory.
6 ) Now paste the following code at the //Code here location:

public:
	int PTEST( __int16* bufferA, __int16* bufferB)
	{
		__m128i a, b;

		//transfer the buffers to the _m128i data type, because we do not want to handle with that in managed code
		memcpy(a.m128i_i16, bufferA, sizeof(a.m128i_i16));
		memcpy(b.m128i_i16, bufferB, sizeof(b.m128i_i16));

		//Call the SSE4 PTEST instructions 
		return _mm_testc_si128(a, b);
	}

This _mm_testc_si128 will emit the SSE4 PTEST instruction. We have a little memory operation right before it to fill out the __m128i data structure on the C++ code. I used memcpy to trasfer the data from the bufferA and bufferB arguments to the __m128i data structure to push it to the PTEST. I prefered to do this here to separate tthe whole SSE4 specific implementation. I could also send the __m128i to the PTEST method, but that would be more complex.

As I mentioned before, in this example I used the PTEST sample with a data structure, you may run into other instructions which require only a pointer, in that case you don’t need to do the memcpy operation. There will maybe some challenges if you are not familiar with C++, especially when the IntelliSense is removed in Visual Studio 2010 VC++, but you can always check out for online answers. For example the __m128i data structure is visible in the emmintrin.h file, which is located somewhere [Program Files]\Microsoft Visual Studio 10.0\VC\include. Or you can check all fundamental data types if you are not sure what to use instead of __int16*.

7 ) Now paste the following code on top of your managed C++ code. Which is on the bottom of your TestCPUInt.h file in the namespace section.

namespace TestCPUInt {

	public ref class SSE4
	{
	public:
		int PTestWPointer(__int16* pBufferA, __int16* pBufferB)
		{
			SSE4_CPP * sse4_cpp = new SSE4_CPP();
			return sse4_cpp->PTEST(pBufferA, pBufferB);
		}

}

What we did here is to pass forward the pointers pBufferA and pBufferB, which we are going to call from C#, into the unmanaged code. For those who are not familiar with pointers, the * sign defines a pointer: __int16* means a pointer to a 16 bit integer. In our case that is the address of the first element of an array.
There are also ways without using managed C++ to call a dynamic library, but as I mentioned before, I’m showing only the simplest way for a C# developer.

8 ) Let’s go to our C# code in the console application to use this functionality.
9 ) First we have to switch the application to allow unsafe code. For that goto the project properties and check the “Allow unsafe code” under the build tab.
10 ) Add the following code to your Program.cs file:

static int TestCPUWithPointer(short[] bufferA, short[] bufferB)
{
	SSE4 sse4 = new SSE4();
	unsafe
	{
		//fix the buffer variables in memory to prevent from getting moved by the garbage collector
		fixed (short* pBufferA = bufferA)
		{
			fixed (short* pBufferB = bufferB)
			{
				return sse4.PTestWPointer(pBufferA, pBufferB);
			}
		}
	}
			
}

If you never used unsafe code before, you can check out unsafe (C# Reference). Actually, it is fairly simple logic; the PTestWPointer required a pointer to an array and the only way to get the pointer to an array is to use the fixed statement. The fixed statement pins my buffer array in memory in order to prevent the garbage collector to move it around. But that comes with a cost: in one of my projects the system was slowing down because of too many fixed objects in memory. Anyways, you may have to experiment for your own project.
That’s it!

But we will not stop here, for comparison I did the same operation in C#, as seen below:

static int TestCLR(short[] bufferA, short[] bufferB)
{
	//We want to test if all bits set in bufferB are also set in bufferA
	for (int i = 0; i < bufferA.Length; i++)
	{
		if ((bufferA[i] & bufferB[i]) != bufferB[i])
			return 0;
	}
	return 1;
}

Here I simply calculate if every bit of bufferB is in bufferA; PTEST does the same.

On the rest of the application I compared the performance of these two methods. Below is a code which does the comparison for sake of testing:

static void Main(string[] args)
{
	int testCount = 10000000;
	short[] buffer1 = new short[8];
	short[] buffer2 = new short[8];

	for (int i = 0; i < 8; i++)
	{
		buffer1[i] = 32100;
		buffer2[i] = 32100;
	}
					
	Stopwatch sw = new Stopwatch();
	sw.Start();
	int testResult = 0;
	for (int i = 0; i < testCount; i++)
		testResult = TestCPUWithPointer(buffer1, buffer2);	
	sw.Stop();
	Console.WriteLine("SSE4 PTEST took {0:G} and returned {1}", sw.Elapsed, testResult);

	sw.Start();
	for (int i = 0; i < testCount; i++)
		testResult = TestCLR(buffer1, buffer2);
	sw.Stop();
	Console.WriteLine("C# Test took {0:G} and returned {1}", sw.Elapsed, testResult);

	Console.ReadKey();
}

On my environment I gained %20 performance. On some of my projects I gained up to 20 fold performance.
A last thing I would like to do is to show you how to move the fixed usage from C# to managed C++. That makes you code little cleaner like the one below:

static int TestCPU(short[] bufferA, short[] bufferB)
{
	SSE4 sse4 = new SSE4();
	return sse4.PTest(bufferA, bufferB);
}

As you can see, it is only a method call. In order to do this we have to add the following code to the TestCPUInt class in the TestCPUInt.h file:

int PTest(array<__int16>^ bufferA, array<_int16>^ bufferB)
{
	pin_ptr<__int16> pinnedBufferA = &bufferA[0]; // pin pointer to first element in arr
	__int16* pBufferA = (__int16*)pinnedBufferA; // pointer to the first element in arr
	pin_ptr<__int16> pinnedBufferB = &bufferB[0]; // pin pointer to first element in arr
	__int16* pBufferB = (__int16*)pinnedBufferB; // pointer to the first element in arr

	SSE4_CPP * sse4_cpp = new SSE4_CPP();
	return sse4_cpp->PTEST(pBufferA, pBufferB);
}

This time our method takes a managed array object instead of a __int16 pointer and pins it in the memory like we did using fixed in C#.

Conclusion

I believe that as much as higher level frameworks we are using there will always be situations where we have to use our hardware resources more wisely. Sometimes these performance improvements save us big amount of hardware investment.
Please go ahead look into CPU instructions, Parallel computing, GPGPU etc. and understand the hardware; knowing your tools better will make you a better software architect.

Standard
C# language, CodeProject, Technical

Careful with Optional Arguments in C#4.0


Introduction

Some time passed since optional arguments are introduced with Visual C# 2010 and we all got used to the convenience of not having to define method overloads for every different method signature. But, recently I came across of a limitation using optional arguments on enterprise solutions and now I’m using it with care.

The limitation is that if you use the optional arguments across libraries, the compiler will hard code the default value to the consumer and prevent you from re-deploying the provider library separately. It is very common scenario on enterprise applications where we have many libraries with different versions married to each other and this limitation makes it impossible to re-deploy only one dll without re-deploying all related libraries.

On this post I will explain you the details of this limitation. As I mentioned in my previous post Under the hood of anonymous methods in c#, it is very important to know the underlying architecture of a functionality you are using. This post is similar to that by explaining mechanics of the optional arguments.

Background

I will not explain what optional arguments are, because it is already out since a while. But I will just give a quick reference to the description for those who are new to C#:

Optional arguments enable you to omit arguments for some parameters. … . Each optional parameter has a default value as part of its definition. If no argument is sent for that parameter, the default value is used. Default values must be constants.” (Named and Optional Arguments (C# Programming Guide) )

I run into this limitation while I was using some functionality from a secondary library, injected through an IoC container. The secondary library was accessed through an interface where some methods had optional arguments. I had everything deployed and working until I had to make some changes to the secondary library and alter the optional argument. After re-deploying the secondary library I figured out that the changes did not take effect and went I went into the IL code I figured out that the main library had the constant hard-coded into it.

In my situation I had interfaces, injections and a lot of complexity; to better picture the situation I will be using the simplest form of the limitation as in the following sample:

  • ProjectB has a method named Test with the following signature:


public void Test(string arg1 = "none")
{
}

  • ProjectA is referencing ProjectB and using the Test method with its default argument by making the following call:


static void Main(string[] args)
{
Class1 class1 = new Class1();
class1.Test();
}

This works very well, because ProjectA is just using arg1 value as “none”. Now let’s look what is happening behing the scene:

If we compile ProjectA along with ProjectB and analyze the IL code, we will see that the optional argument is nothing more than a compiler feature. Because calling Test() is no different from calling Test(“none”), the compiler decides to compile our code as Test(“none”). That can be seen in the IL code and disassembled C# code below; the string constant “none” is hard-coded into ProjectA.


.method private hidebysig static void Main(string[] args) cil managed
{
...
L_0008: ldstr "none"
L_000d: callvirt instance void [ProjectB]ProjectB.Class1::Test(string)
...
}


private static void Main(string[] args)
{
new Class1().Test("none");
}

For libraries tightly coupled or in-library usage it is good that the compiler helps us eliminating some code and makes our life easier. But this comes with a price:

Let’s say we had to modify the Test method in ProjectB as Test( string arg1 = “something” ) and re-deploy it without re-deploying ProjectA. In this case ProjectA would still be calling the Test method with “none”.

Conclusion

Knowing this, it is good to use the optional arguments with caution across libraries when you have to support deployment scenarios with partial solutions.

Standard
Architecting, CodeProject, Technical

Overview: Using Claims-based Access on Windows Azure


Introduction

On this blog, I would like to talk about a technology which will simplify user access for developers by allowing building claims-aware application: Windows Identity Foundation (WIF). The goal is to improve developer productivity, enhance application security, and enable interoperability. For most of us claims-aware authentication is nothing new, the new part I’m writing about is how to implement it in Windows Azure environment. If you already developed some solutions with WIF and also know what claims-based identity means, you can save some reading and jump directly to the Windows Azure section.

Background

Authorization and authentication is a very important aspect of any software solution. But somehow many software solutions are not good enough designed to protect private or important data against attackers.

Many applications have their own user identity store developed into their business layer, where it has to be maintained and supported along with the application by developers. I’m not saying that every solution is like this on the world, there are many examples of centralized authorization and authentication solutions, but as we are entering another era, where we are moving our solutions to Windows Azure, it is becoming clear to us that our security solutions does not provide the required scalability and extensibility.

Claims-based Solution

Thanks to Windows Identity Foundation (WIF), we have the option to stop writing our custom identity plumbing and user identity databases for every application using .Net Framework. So, what is claims-based identity? I’ll try to explain it very short; the following diagram shows a simple and classic authentication implementation:

Diagram – 1

The client enters user name and password (1) to get access to the secure area. The data is passed to the security layer (2), where the credentials are validated against a user identity store (3). Then the result is most probably returned in form of some user claims (4) and passed to the application (5). The term “Claims” is used here in order to express that the data returned from the identity store has also more than just user properties. A claim can be a user’s email address, department etc. But claims are a little more than just user properties; claims include trust information about the provider too. At the end the client may receive access to the secure data (6).

Windows Identity Foundation

There is nothing wrong with this picture until you want to implement a different security, enhance your security or add one more application to your system. The problem is that you have to modify your code based about identity and security. Claims-based identity allows you to decouple this logic from the heart of your application and give the responsibility to another entity. That entity is the Identity Provider, as seen below:

Diagram – 2

The client starts with requesting the token requirements from the application (1), receives them (2) and passes them to the Identity Provider’s Security Token Service (STS) along with the user name and password (3). The STS validates the user information against the user identity store (4) and receives the token answers in form of claims (5). Then it passes the token along with the claims and a public key back to the client (6). The client forwards the token to the application. The application uses (8) WIF to resolve (9) the token (Canonicalization, Signature checking, Decryption, Check for expiration, Check for duplication etc) and maybe receives the claims it requires out of the token (10) if the token is valid. At the end the client may receive access to the secure data (10).

At the first look it looks like we are introducing a lot of steps (from 6 to 11), but when you think about the simplicity about the code you have to write to authenticate a user in your application using WIF saves a lot of trouble. For example, the only code I have to write to get a claim looks like this:


protected void Page_Load(object sender, EventArgs e)

{

    IClaimsIdentity claimsIdentity = ((IClaimsPrincipal)(Thread.CurrentPrincipal)).Identities[0];

    String userEmail =

        (from c in claimsIdentity.Claims

        where c.ClaimType == System.IdentityModel.Claims.ClaimTypes.Email

        select c.Value).FirstOrDefault();

}

In this code, we are accessing the instance of our claims identity and querying the claims for the email address, which is a claim from a token generated by the STS. The rest is only application configuration. For example, in a very simple implementation we could only setup some pages to be seen by a certain role and make the following configuration:

<location path=”SecretPage.aspx”>

<system.web>

<authorization>

<allow roles=”Manager”/>

<deny users=”*”/>

</authorization>

</system.web>

</location> 

Please check out the SDK and Toolkit links below, you will really enjoy the practicality of this framework.

Practicality is not the only advantage; it is also powerful when we want to implement different authentication technologies. For example switching to use active directory will not require any code changes on your application as well as on the communication to your application as seen below:

Diagram – 3

With said that, I would like to introduce you to the other parts of the Microsoft’s Identity and Access Platform software family: Active Directory Federation Services (ADFS) 2.0 and Windows CardSpace 2.0. You can get more information about these technologies on
http://msdn.microsoft.com/en-us/security/aa570351.aspx. I’ll not get into the details of ADFS and CardSpace because it is outside our current scope, but those are nice technologies which you may use for your custom solution.

Customization

You can also develop your own custom STS with the help of the Windows Identity Foundation SDK. Creating a custom STS is as simple as right clicking your project, selecting “Add STS reference…” and following the wizard to create a new STS project. For more information you can download WIF and WIF SDK from these links:

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=eb9c345f-e830-40b8-a5fe-ae7a864c4d76&displaylang=en

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=c148b2df-c7af-46bb-9162-2c9422208504&displaylang=en.

And please do not forget the Toolkit from here
http://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=c3e315fa-94e2-4028-99cb-904369f177c0, which provides all possible code examples you would need. Just a note: Please start first by setting up your environment as described in the Setup.docx in the Assets folder of the toolkit. I’m one of those who read the manual last, you can save the time until that point:)

Windows Azure

Now, how all this will help us on Windows Azure platform?

There are many scenarios you can start using Windows Azure platform as a part of your solution. One of them is to place your application to the cloud and keep your STS on-premise. In this way you can still take advantage of local identities for authenticating your users and take advantage of Windows Azure features like load balancing, scalability etc. The following diagram shows this example:

Diagram – 4

The logic is very similar to the previous once; the client connects to the application (6) only after sending the application (in this case a ASP.Net web application on a ASP.Net web role) a token (5) which was gathered from the identity provider (4) based on the requirements defined by the application (2). I did not draw the details of the identity provider and application because if can vary as I described in the claims-based identity scenarios. Please notice that the WIF is still used in the application on the Windows Azure. But, WIF is not in the Windows Azure GAC which is visible to the Windows Azure applications. Therefore I wanted to point it out that it is manually referenced.

One of the important details is to know that the application hosted in Windows Azure can have a different URI because of development, testing and production environments. Therefore the application’s URI should be dynamically embedded into the token for reply. As a result, the STS has to also be modified to validate the reply URI.

Another detail is to establish a trust relationship between the ASP.Net Web Role and the STS. That is simply done with the wizard which shows up when you right-click on your application project and select “Add STS reference…”.

One another key scenario is to also to delegate the Identity providing part to Windows Azure. That is by using the Windows Azure AppFabric Access Control technology instead of our STS.

The usage of AppFabric Access Control is shown in the following diagram:

Diagram – 5

As you can see, the claims-based workflow is not changed much; I only simplified the communication (1) just for clarity purposes. The SWT stands for Simple Web Token.

You can get more details about this scenario from the WIF toolkit labs at
http://msdn.microsoft.com/en-us/wazplatformtrainingcourse_introtoacslab2010_topic2.aspx .

Conclusion

Using claims-based access solutions simplifies a project in many aspects like maintenance, security, extensibility etc. Having this technology applied to the cloud, expands our endless solutions space in another dimension. I only scratched the surface of this nice technology, please go ahead and check out the resources I pointed in this post. I hope you will enjoy it as much I did.

Standard
Architecting, CodeProject, Technical

Are you really addressing the problem?


Introduction

Some time ago I came across a software architect who had the “my way or the highway” attitude on providing solutions. I had a very interesting conversation with him about IoC containers; here is a part of the conversation:

him : What are you using currently as the IoC container.
me : I wrote one based on our companies requirements.
him : No, you can not do that, you have to use Microsoft’s Enterprise Library!

This is totally opposite thinking approach to mine; so far I read, learned and experienced about software architecting is that you can not hold on a solution and try to stick it into every problem. First the famous saying “If you only know how to use a hammer, every problem is a nail” comes to my mind, second I’m asking what does he know about my problem domain? Does he know what my non-functional requirements are? Does he know what kind of company, team, time or project I’m talking about? Certainly not, he is just thinking solution first!

After thinking about this situation, I figured out that a lot of people are thinking alike: certain things HAS to be done in a certain way with a certain technology. I believe that this mindset is the biggest roadblock of becoming a good architect who delivers projects successfully. Therefore I decided to write this post to explain what are some basic steps you have to follow which will help you to become a better software architect or better said deliver projects successfully.

The key term here is “successful project”; a project is ment to be successful when it complies with the requirements. Said that, the most important aspect of a solution is to be compliance with the requirements.

Requirements

Requirements describe the problem domain. Depending to the project and the chosen methodologies, the requirement can be acquired in advance or can be adjusted while the project is in progress. Requirements are provided by the stakeholders usually in a form of a document named System Requirements Document, which could be directly a subject matter expert or a business analyst which is a domain expert.

Note: This document should not be confused with a project vision document. There are different deliverables in software development projects, for more information you can visit Microsoft Solution Framework resourcese like this link http://msdn.microsoft.com/en-us/library/dd286619.aspx). Actually, some projects are not started with a requirements document, but that does not mean that we can just omit the requirement; an architect needs to work with the stakeholders. It is also well-known that there is nothing temporary than permanent requirements in a project:) but this fact should also not be a reason to act like a stranger to the stakeholders.

The problem domain contains a horizontal and a vertical axis. Mostly, the stakeholders are responsible of the vertical axis which contains their specific industry knowledge. The horizontal axis defines domains which apply to all industries, like SSL, WCF etc. But I’ve never seen a document defining the domains, we always define requirements and provide solution in domains.

Requirements are divided into two main categories named functional and non-functional requirements. Functional requirement defines what a system is supposed to accomplish, like input, output and behavior. Functional requirements are usually implemented in code; it helps to design a system and architect the application. But, I saw a lot of architects thinking that code is the only artifact that matters in a software development project. This mindset is known to be a developer attitude in solution space and architects should avoid it.

Note: This mindset is really hard to change especially if you got promoted from a developer to an architect. Unfortunately, that is the way it is happening today: once you develop software for 10 years, you become an architect. This makes us believe that architect is the next title. But in realty, software architecture is a totally different career path and just a small part is a bout coding. I’m strongly suggesting to read Microsoft Architecture Journal 15 ( http://msdn.microsoft.com/en-us/architecture/cc505966 ) for those who are architect. A good architect understands that software solutions consist of more than functional requirements.

On the other hand, non-functional requirements define the operation of the system, which gives the project its unique flavor or better said quality. Security, performance, scalability, testability, usability, price, privacy, extensibility, support, deployment etc. qualities are just couple of them.

It is very possible that functionally two same products can be totally different because they are relative to non-functional requirements. e.g. a web site which will support 10 users for his entire life would be different from a web site which will support million users.

Therefore, I believe that problem first thinking approach will save your project a lot of time and money; as much as you listen to your stakeholders, as much as you study them and understand the problem, as closer solution to their needs you will provide.

I saw the humorous drawing below, which is based on project management, but has some architecting flavor in it. I don’t know who should the credits go for capturing this real world problem, but this picture is thousand words worth.

Solution

Solution should always come after the requirement is understood well. But, that does not mean that the whole project requirement has to be understood well. You can grab one, all or group of related requirements, understand them, and work on the solution in that context.

That’s why I look at the solution as something relative to the requirement. And because every project is unique by time, company, team, budget, culture and requirements, it is impossible to have two the same solution. Therefore, even having finished a project successfully will never guarantee you the success for other one.

Every architects provides his solution in the way they know the best. The only common ground is the industry proven patterns and best practices; those are an architects tool belt. An architect has to look at the requirements and investigate a tool for the problem. I’m not saying choosing a tool, because there are and will always be more tools than you know, no matter how much you learn. I’m suggesting to start every requirement by accepting that we don’t know anything about it. The worst thing we can lose is to read a subject the second time. But what we gain is to extend the solution space outside our knowledge.

Perhaps there is no exact formula of how to architect a good solution; there is common logic which every human brain can think about, there is experience which everybody holds a piece and there are diverse blogs like this to increase communal innovations. Please visit also the Microsoft Architecture Center (http://msdn.microsoft.com/en-us/architecture/cc511514) which is a great resource for those who are on architect path. And please don’t forget that every solution is relative to requirements, time and place.

Conclusion

Going back to top of this post: I would expect from a good architect to reply to me as: “Interesting choice; what were the requirements to make this choice?”.

Standard
C# language, CodeProject, Technical

Under the hood of anonymous methods in C#


Introduction

On this post I’ll help you to make better decisions on using C# anonymous methods. We all have our favorite or known ways to program certain solutions. But I believe that providing the right solution for the problem requires us to free our mind of all biases and think in the context of the problem. Every feature, including GOTO, has a place in this endless solution universe, therefore I’m not telling what not to use, I’m just showing under the hood, so you can make your own decision based on your unique solution.

Background

There are situations anonymous methods are really useful; especially if you have a small operation you want to do inline. Another point is that you loose on Readability and Testability; two important qualities of enterprise software. In smaller teams or developer owned projects, these qualities are sometimes omitted. (Nothing bad with that; as I said: there is a place for every feature). In bigger teams, especially if you have to review or continue another programmer’s code, it is really hard to read a line with complex inline code. I faced this problem many times on code review and code ownership changes.

So, let me show you what is going on under the hood, and you can make your own decision:

class Anonymous
{
	public void Method()
	{
		bool test = true;
		Action a = ()=> test = false;
	}
}

Figure – 1

In the example I used the anonymous method on an Action to simplify the disassembled IL code. The code looks very straightforward and simple to write; there is only one delegate which captures the boolean variable test and sets the value to false.

I kept the code simple to show the structural changes. There is even no call to the method; structurally it does not have any affect. What happens to the structure in the background after you compile, is shown in the disassemled MSIL code below (Collapsed IL code by purpose;to point to the structure):

.class private auto ansi beforefieldinit Anonymous
    extends [mscorlib]System.Object
{
    .method public hidebysig specialname rtspecialname instance void .ctor() cil managed
    {
    }

    .method public hidebysig instance void Method() cil managed
    {
    }

    .class auto ansi sealed nested private beforefieldinit <>c__DisplayClass1
        extends [mscorlib]System.Object
    {
        .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor()
        .method public hidebysig specialname rtspecialname instance void .ctor() cil managed
        {
        }

        .method public hidebysig instance void b__0() cil managed
        {
        }

        .field public bool test

    }
}

Figure – 2

As you can see, a private sealed nested class named c_DisplayClass1 is created with one method named b__0 for our anonymous method. In the collapsed code our original Method method is modified in a way to create a new instance of c_DisplayClass1 to call the b__0 method. For those who are new to IL, I manually assembled it into C# code below:

class ILAssembled
{
	public void Method()
	{
		nested nt = new nested();
		nt.test = true;
	}

	private sealed class nested
	{
		public bool test;

		public void MethodNested()
		{
			test = false;
		}
	}
}

Figure – 3

The compiled IL code of the C# code above is exactly the same (excep some not important small changes). It is important to know that this IL result is for the code above. In fact, if I would move the local variable test to an instance variable, a method would be created instead of a nested class. Therefore, it is always safer to take a look at the disassembled IL code. My favorite tool is red Gate’s .Net Reflector, which can disassemble into many languages. Another big advantage of looking into the post compiled code is to study .Net Framework namespaces and learn good practices.

Let’s go back to our code. It does not mean that you could write the code in Figure-3 instead of in Figure-1 and save some compile time:). The point is that the magic is in the compiler in exchange of testability, readability and maybe some performance because you end up with a nested class instead of a method.

Instead, if I would write the code below, I would gain all the qualities I was looking for my unique solution:

public void Method()
{
	bool test = true;
	MyAction(test);
}
private void MyAction(bool test)
{
	test = false;
}

Figure – 4

As you can see, I created a method named MyAction to implement the functionality. Now I can create a unit test to test the MyAction and my colleagues can also understand the code at first look. It is really important to understand the code at first look if you are reviewing tens of classes; encrypting variable and method names, over using implicit type var and over using anonymous methods double or triple the review time.

For clarity I also provided below the disassembled IL code structure of Figure 4.(Collapsed IL code by purpose;to point to the structure):

.class private auto ansi beforefieldinit NonAnonymous
    extends [mscorlib]System.Object
{
    .method public hidebysig specialname rtspecialname instance void .ctor() cil managed
    {
    }

    .method public hidebysig instance void Method() cil managed
    {
    }

    .method private hidebysig instance void MyAction(bool test) cil managed
    {
    }

}

Figure – 5

As you can see, there is no nested class declared to run the anonymous method.

Conclusion

I believe knowing what you are using is the key of making the right decision for you unique solution.

Standard