Breaking News

Wednesday, June 17, 2009

C# data type corresponding to VARCHAR(MAX)

I stumbled into size limitations while specifying VARCHAR(MAX) in C# code.

Following two options can be the C# alternative to VARCHAR(MAX) or NVARCHAR(MAX) that I have tried for SQL Server 2005.

Lets go step by step and verify as well.

STEP 1: Create a table

CREATE TABLE TBL_MAXTEST (LARGE_DATA VARCHAR(MAX))

STEP 2: C# Code
Run the following locs.

2a) Set parameter size equal to minus one (-1)
//Create a brand new object
SqlParameter theParam = new SqlParameter();

//Fill in the public properties
theParam.ParameterName = "@LARGE_DATA";
theParam.SqlDbType = SqlDbType.VarChar;
theParam.Size = -1;
theParam.Value = strLargeData;
theParam.Direction = ParameterDirection.Input;

//Add and enjoy!
cmdSQL.Parameters.Add(theParam);


2b) Or donot specify any column size.
By default the length is assumed to be -1;




STEP 3: Verify data length
Verify the length of the data.

SELECT DATALENGTH(LARGE_DATA) FROM TBL_MAXTEST

The above shall give the length of the data. I have tried it with 7999 bytes, 8000, 8001, and 10000 bytes.

Basically, VARCHAR and NVARCHAR comes in Large-Value Data Types. NVARCHAR supports Unicode characters which means a wider "range" of characters/code pages. And since its Unicode therefore a single character is stored on two bytes; which inherently means NVARCHAR is twice the size of VARCHAR.

While VARCHAR stores one char per byte; and the problem with data types that use one byte to encode each character is that the data type can only represent 256 different characters.

The following table shows the relationship between the large -value data types and their counterparts from earlier versions of SQL Server.

Large-value data types Earlier versions (SQL Server 2000/etc)
varchar(max) text*

nvarchar(max) ntext*

varbinary(max) image



Note that, TEXT, NTEXT, and IMAGE data types are now a deprecated data types.

  • Use char when the sizes of the column data entries are consistent/same.
  • Use varchar when the sizes of the column data entries vary considerably; that is the max and minimum amount of length is known.
  • Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.
A table can contain a maximum of 8,060 bytes per row; but this restriction does not apply to varchar(max), nvarchar(max), varbinary(max), text, image, or xml columns.

You can use the large-value data types, i.e. the MAX value, to store up to 2^31-1 bytes of data; precisely 2147483648 – 1 bytes; approximately 2 GB.

5 comments:

  1. Just a minor comment: It's wrong to assume that Unicode uses two bytes per character. The number of bytes per character depend on the encoding scheme (rather than characterset). So, the number of bytes depend on whether you use UTF-8 or UCS-16, etc.

    This is a must read: http://www.joelonsoftware.com/articles/Unicode.html

    ReplyDelete
  2. Thanks for the comment and correction. I have been through this mind jolting article before. Joel clearly explains "Some people are under the misconception[which I was] that Unicode is simply a 16-bit code where each character takes 16 bits and therefore there are 65,536 possible characters. This is not, actually, correct."-

    I should have been more elaborative; Usually when we talk about unicodes, we are talking about UTF16; which uses 2 bytes to represent each character. 2 bytes meaning 16 bits, which is 256x256=65536 codes; this provides enough char codes to represent all the most common characters being used almost all of the languages around the world.

    And then, although UTF32 can handle every character in the universe, I see[http://stackoverflow.com/questions/625018/why-utf-32-instead-of-utf-16-if-we-have-surrogate-pairs] that the industry is far from using UTF32 in practical in near future.

    ReplyDelete
  3. Yeah, UTF32 is not practical but what about UTF-8? Have you ever seen an xml document with a comment as the very first line saying, "UTF-8"?

    ReplyDelete
  4. Not really, but would like to try one out since I have read alot about its performance. Though, UTF-8 is fast, no byte order mark or endian'ness jingle bells associated with it; which makes it more robust and easily interpretable; I would prefer UTF-8 only if I know about the data that I am going to process in my xml; for instance, lets say, English language. Now I know UTF-8 would support all of the characters that I would need. Otherwise I may get a shock[http://stackoverflow.com/questions/932666/problem-with-word-nestle-in-an-xml-doc-utf-8-encoding-using-nxxmlparser-any] or few from time to time, and end up losing some characters after rendering the data to output interface. Then probably I would have to keep my "special characters" in some CDATA section, and probably would require me to either write my own parser or amend one or google one out... well, just thinking... (0:

    Thanks for your comment.

    ReplyDelete
  5. The net grid application demonstrates grid capabilities and its use as a TreeList component and as a full-featured grid with multiple headers.Source codes demonstrate implementation ease of cell editors,filtering and grouping dapfor. com

    ReplyDelete

Designed By Published.. Blogger Templates