September 26, 2008

Boxing/Unboxing in .NET



This post has been long overdue; I should have posted it two months back.
Anyway, a couple of my friends attended training in .NET concepts sponsored by
my company. They came back from the training and discussed with me the stuff
they had learnt. One of the fellows mentioned that they had been taught that
int.ToString() converts an integer (a value type) to a string (a reference type)
and hence, boxes the int. On the other hand, int.Parse() converts a string to an
integer and hence, unboxes the string. When I heard that, I knew deep within me
that what they had been taught was incorrect. But, I did not know why. So, I set
out to find the answer. To find out what happens during a call to int.ToString(),
I decompiled mscorlib.dll (It is present is the .NET Framework directory which
on my machine happens to be C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727) which
contains the implementation of Int32 using Lutz Roeder’s .NET Reflector.



The implementation of ToString() is as follows:



public override string ToString()

{

return Number.FormatInt32(

this,

null,

NumberFormatInfo.CurrentInfo);

}



The Number.FormatInt32() method is declared as follows:



[MethodImpl(MethodImplOptions.InternalCall)]

public static extern string FormatInt32(

int value,

string format,

NumberFormatInfo info);



According to MSDN, the extern modifier is used in C# to declare a method that is
implemented externally. So, the question is: where is FormatInt32() (see above
code fragment) implemented? The answer lies in the MethodImpl attribute which
decorates the method declaration. According to MSDN,
MethodImplOptions.InternalCall specifies that the method is implemented in the
CLR itself. So, I proceeded to download SSCLI a.k.a. Rotor (source code to a
working implementation of the CLR) from here.



I learnt from this site that the ecall.cpp file (which is located at \clr\src\vm\ecall.cpp
in the SSCLI) contains a table that maps managed internal call methods to
unmanaged C++ implementations. I searched for FormatInt32 and found the
following code:



FCFuncElement("FormatInt32", COMNumber::FormatInt32)



This tells that the implementation of FormatInt32 method is actually the
implementation of the native C++ COMNumber::FormatInt32 function. So, the next
question is where do I find the implementation of COMNumber::FormatInt32
function? I noticed that there was a file named comnumber.cpp in the \clr\src\vm
directory. I opened the file and started to examine the COMNumber::FormatInt32
function. I discovered that COMNumber::FormatInt32 calls COMNumber::Int32ToDecChars
function. This function is defined as follows:



wchar_t* COMNumber::Int32ToDecChars(

wchar_t* p,

unsigned int value,

int digits)

{

LEAF_CONTRACT

_ASSERTE(p != NULL);



while (--digits >= 0 || value != 0) {

*–p = value % 10 + ‘0′;

value /= 10;

}

return p;

}



As you can see here, COMNumber::Int32ToDecChars takes each digit of the integer
starting from the rightmost digit and proceeding to the leftmost, converts it to
the equivalent character and stores it in a string and returns the string. There
is actually more action that goes on inside COMNumber::FormatInt32 but, I won’t
be discussing that here. The core function is performed by COMNumber::Int32ToDecChars.
So, I would wrap up the discussion of int.ToString() function by saying that it
converts individual digits of an integer to their equivalent characters, stores
them in a string and returns the string.



Next, I tried to figure out what goes on inside int.Parse(). I used .NET
reflector and found out that it is pretty similar to what int.ToString() does.
The string is read character-by-character, converted to its equivalent digit and
added to a number after the digits converted previously have been shifted by one
position.



Most C# textbooks provide an example as shown below for boxing/unboxing:



int i = 1729;

object o = i; // Boxing

int j = (int)o; // Unboxing



int.ToString() and int.Parse() cannot be used in the above manner and so, these
functions are not even remotely related to boxing/unboxing.



My final task was to find out what actually happens during boxing/unboxing. The
documentation that is available made my task easy. I referred the following:



1. C# Language Specification

2. Shared Source CLI Essentials - By David Stutz, Ted Neward and Geoff Schilling



See the excerpts from Shared Source CLI Essentials:



By default, when an instance of a value type is passed from one location to
another as a method parameter, it is copied in its entirety. At times, however,
developers will want or need to take the value type and use it in a manner
consistent with reference types. In these situations, the value type can be
“boxed”: a reference type instance will be created whose data is the value type,
and a reference to that instance is passed instead. Naturally, the reverse is
also possible, to take the boxed value type and dereference it back into a value
type - this is called “unboxing”.



The box instruction is a typesafe operation that converts a value type instance
to an instance of a reference type that inherits from System.Object. It does so
by making a copy of the instance and embedding it in a newly allocated object.
For every value type defined, the type system defines a corresponding reference
type called the boxed type. The representation of a boxed value is a location
where a value of the value type may be stored; in essence, a single-field
reference type whose field is that of the value type. Note that this boxed type
is never visible to anyone outside the CLI’s implementation-the boxed type is
silently generated by the CLI itself, and is not accessible for programmer use.
(It is purely an implementation detail that would have no real utility were it
exposed.)



This is made clearer by the C# language specification. Please refer to section
4.3 of the specification.



click here for more info mcitp training


click here for more info mcst training

No comments:

Bookmark and Share