Why Strings Are Immutable In Dot Net?


Next Read : An Extensive Examination Of ArrayList in C#

Most of the developers know that strings in .Net are immutable. But a very few knows the reason behind this behavior.  I will try to explain the same in this article.

Before diving into the reason, let me first explain what do we mean by immutable?

What do we mean by Immutable strings? 

The dictionary meaning of immutable is “unchanging over time or unable to be changed”. This means once a value is assigned to String object, it can never be changed. Yes, you read it correctly. Consider the following code:

myString

Output of this code will be.

abcdef
abcdefghijkl

Though it seems as if we just changed the value of myString from “abc” to “abcdef” and then to “abcdefghijkl”, but we really didn’t! Lets try to understand it. In first step, a new string object is allocated on the heap with value of “abc” and myString points to this memory location. At step no 2 (myString += “def”;), a new string object is allocated on heap with value of “abcdef” and myString now points to this new memory location. But the string “abc” still exists on heap. So we actually sit with two string objects on the heap, even though we’re only referencing one of them. Continuing same way, at the end of this code we will have four string objects, with only one object referenced and other three unused. The following memory allocation diagram of above code will make things more clear.

Memory Allocation of immutable strings
Memory Allocation of immutable strings

Now we will move on to the context of why.

Why strings are immutable in dot net?

Designers of .Net decided to implement immutable text strings. They have multiple reasons for this architecture. If programmers have multiple string variables with same value, it will avoid allocating memory for same string value multiple times. It will allocate memory to string once and all the variables will point to the same memory block.  Consider the following block of code.

same string

Memory allocation for this code will look like this:

Memory Allocation of Same String
Memory Allocation of Same String

If strings were mutable, changing the value of str1 would have changed the value of str2 and str3 also, which is unwanted.

Second, immutable strings eliminates race conditions in multi threaded applications. Any text amendment causes creation of a new variable so there is no need to set up the lock to avoid conflicts while multiple threads simultaneously access text. In some cases, those race conditions could be used to mount security attacks.  For example, you could satisfy a FileIOPermission demand with a string pointing to an publicly accessible section of the file system, and then use another thread to quickly change the string to point to a sensitive file before the underlying CreateFile occurs.

Another reason for string immutability is the well adapted use of strings as keys in hashtables. The objects on which the hash values are computed must be immutable to make sure that the hash values will be constant in time.

Another cool thing about string immutability is that even though System.String is a class, string objects get compared with equivalence, as a value type. This is possible because we can consider that the identity of an immutable object is its state. Consider following piece of code:

String Comparison

Even thought str1 and str2 reference 2 different objects ,the above code returns true.

StringBuilder : An alternative to avoid creation of unused strings

As we saw in figure “Memory Allocation of immutable strings”, there are unused strings allocated in memory. Its because of the way string behaves. If a code makes thousands of operations on string , heap will have thousands of unused string objects leading to unwanted memory wastage. Fortunately we can avoid this by using StringBuilder class. In my next article I will discuss about this class.

I hope you must have enjoyed the article. Please leave your comments and feedback in comments section at bottom. If you have any doubt or query, please feel free to ask in comments. Thanks for reading.

Click to read it on LinkedIn

Next Read : An Extensive Examination Of ArrayList in C#

An Extensive Examination Of ArrayList in C#


Next Read : Why Strings Are Immutable In Dot Net?

We all use Arrays in c# and other programming languages. Array creates some limitations on design. First, Arrays are homogeneous i.e. you can store only one type of elements. Secondly, when using arrays you must specifically allocate a certain number of elements. Often developers want something more flexible – specially for uncertainty in size of collection. The .Net Framework Base Class Library provides such a data structure called ArrayList located in System.Collections Namespace.

ArrayList is nothing but a Heterogeneous and Self Re-Dimensioning Array :

Elements of different types can be added to the ArrayList. Further, we do not have to concern ourselves with redimensioning the ArrayList. All of this is handled automatically for us. An example of the ArrayList in action can be seen in the code snippet below.

Array List Example
Array List Example

Behind The Scenes :

Behind the scenes the ArrayList uses System.Array of type Object. An object array can hold elements of any type Since all types are derived from Object (either directly or indirectly). By default the size of this array is 16, although it can be defined in constructor or by assigning capacity property. Elements can be added to ArrayList using Add() Method. Behind the scenes , Add() method first compares the no of elements in array with its capacity. If adding the new element causes the count to exceed the capacity, the array is redimensioned and the capacity is automatically doubled.

Performance :

ArrayList provides some additional flexibility compared to array, but this flexibility comes at cost of performance, majorly if you store value types. The ArrayList’s internal array is of object type, so every value type is boxed and stored on heap and each ArrayList element is a reference to a boxed value type. When you access a value type element it is unboxed before you can use it.

The boxing and unboxing, along with the extra level of indirection that comes with using value types in an ArrayList, can hamper the performance of your application when using large ArrayLists with many reads and writes.

ArrayList Data Structure Memory Allocation
The ArrayList contains a contiguous block of object references

The above diagram shows the memory allocation for ArrayList.

The sel-redimensioning of ArrayList should not cause a performance degradation if compared to array. Because you can turn off self-redimensioning by specifying the initial capacity in constructor. If you dont know the exact size, you may have to re-size even with array also when the number of elements inserted increases the size of array.

Memory Allocation on Redimensioning:

Why the size of ArrayList gets doubled when it gets redimensioned? Its a classic computer science  problem to find out how much extra memory should be allocated when running out of space in some buffer.

One option is to allocate just one more element in the array when redimensioning. i.e. if the initial size of array was 10 and when adding 11th element, resize the array to 11. This approach conserves most of the memory but becomes very costly as redimensioning is required at insertion of every additional element.

Second option is to redimension the array 100 or 200 times larger than the current size. i.e. if array is initially allocated 10 element, before inserting 11th element resize it to 1000 elements. This approach greatly reduces the redimensioning overhead , but , if only a few more elements needs to be added, the extra allocated space is wasted.

So after trying various options, the true compromise is to just double the size of array when it becomes exhausted. This is the precise approach that ArrayList takes and its all done automatically for us.

Summary :

  1. ArrayList internally uses array of object type.
  2. ArrayList provides more flexibility than simple array.
  3. Precise size of ArrayList can be set in constructor or by capacity property. By default its 16.
  4. While adding, if no of elements in array exceeds its capacity, array is redimensioned to double of its current size.
  5. Boxing and Unboxing of value types degrades performance of arraylist.

Next Read : Why Strings Are Immutable In Dot Net?