Question

We have reference values created from a Sequence in a database, which means that they are all integers. (It's not inconceivable - although massively unlikely - that they could change in the future to include letters, e.g. R12345.)

In our [C#] code, should these be typed as strings or integers?

Does the fact that it wouldn't make sense to perform any arithmetic on these values (e.g. adding them together) mean that they should be treated as string literals? If not, and they should be typed as integers (/longs), then what is the underlying principle/reason behind this?

I've searched for an answer to this, but not managed to find anything, either on Google or StackOverflow, so your input is very much appreciated.

Was it helpful?

Solution

There are a couple of other differences:

Leading Zeroes:

Do you need to allow for these. If you have an ID string then it would be required

Sorting:

Sort order will vary between the types:

Integer:

1
2
3
10
100

String

1
10
100
2
3

So will you have a requirement to put the sequence in order (either way around)?

The same arguments apply to your typing as applied in the DB itself too, as the requirements there are likely to be the same. Ideally as Chris says, they should be consistent.

OTHER TIPS

Here are a few things to consider:

  1. Are leading zeros important, i.e. is 010 different to 10. If so, use string.
  2. Is the sort order important? i.e. should 200 be sorted before or after 30?
  3. Is the speed of sorting and/or equality checking important? If so, use int.
  4. Are you at all limited in memory or disk space? If so, ints are 4 bytes, strings at minimum 1 byte per character.
  5. Will int provide enough unique values? A string can support potentially unlimited unique values.
  6. Is there any sort of link in the system that isn't guaranteed reliable (networking, user input, etc)? If it's a text medium, int values are safer (all non-digit characters are erraneous), if it's binary, strings make for easier visual inspection (R13_55 is clearly an error if your ids are just alphanumeric, but is 12372?)

From the sounds of your description, these are values that currently happen to be represented by a series of digits; they are not actually numbers in themselves. This, incidentally, is just like my phone number: it is not a single number, it is a set of digits.

And, like my phone number, I would suggest storing it as a string. Leading zeros don't appear to be an issue here but considering you are treating them as strings, you may as well store them as such and give yourself the future flexibility.

They should be typed as integers and the reason is simply this: retain the same type definition wherever possible to avoid overhead or unexpected side-effects of type conversion.

There are good reasons to not use use types like int, string, long all over your code. Among other problems, this allows for stupid errors like

  • using a key for one table in a query pertaining another table
  • doing arithmetic on a key and winding up with a nonsense result
  • confusing an index or other integral quantity with a key

and communicates very little information: Given int id, what table does this refer to, what kind of entity does it signify? You need to encode this in parameter/variable/field/method names and the compiler won't help you with that.

Since it's likely those values will always be integers, using an integral type should be more efficient and put less load on the GC. But to prevent the aforementioned errors, you could use an (immutable, of course) struct containing a single field. It doesn't need to support anything but a constructor and a getter for the id, that's enough to solve the above problems except in the few pieces of code that need the actual value of the key (to build a query, for example).

That said, using a proper ORM also solves these problems, with less work on your side. They have their own share of downsides, but they're really not that bad.

If you don't need to perform some mathematical calculations on the sequences, you can easily choose strings.

But think about sorting: Produced orders between integers and strings will differ, e.g. 1, 2, 10 for integers and 1, 10, 2 for strings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top