Why do we have to mention the data type of the variable in C

https://softwareengineering.stackexchange.com/questions/264736

06-10-2020
|

Question

Usually in C, we have to tell the computer the type of data in variable declaration. E.g. in the following program, I want to print the sum of two floating point numbers X and Y.

#include<stdio.h>
main()
{
  float X=5.2;
  float Y=5.1;
  float Z;
  Z=Y+X;
  printf("%f",Z);

}

I had to tell the compiler the type of variable X.

Can't the compiler determine the type of X on its own?

Yes, it can if I do this:

#define X 5.2

I can now write my program without telling the compiler the type of X as:

#include<stdio.h>
#define X 5.2
main()
{
  float Y=5.1;
  float Z;
  Z=Y+X;
  printf("%f",Z);

}

So we see that C language has some kind of feature, using which it can determine the type of data on its own. In my case it determined that X is of type float.

Why do we have to mention the type of data, when we declare something in main()? Why can't the compiler determine the data type of a variable on its own in main() as it does in #define.

Solution

You are comparing variable declarations to #defines, which is incorrect. With a #define, you create a mapping between an identifier and a snippet of source code. The C preprocessor will then literally substitute any occurrences of that identifier with the provided snippet. Writing

#define FOO 40 + 2
int foos = FOO + FOO * FOO;

ends up being the same thing to the compiler as writing

int foos = 40 + 2 + 40 + 2 * 40 + 2;

Think of it as automated copy&paste.

Also, normal variables can be reassigned, while a macro created with #define can not (although you can re-#define it). The expression FOO = 7 would be a compiler error, since we can't assign to “rvalues”: 40 + 2 = 7 is illegal.

So, why do we need types at all? Some languages apparently get rid of types, this is especially common in scripting languages. However, they usually have something called “dynamic typing” where variables don't have fixed types, but values have. While this is far more flexible, it's also less performant. C likes performance, so it has a very simple and efficient concept of variables:

There's a stretch of memory called the “stack”. Each local variable corresponds to an area on the stack. Now the question is how many bytes long does this area have to be? In C, each type has a well-defined size which you can query via sizeof(type). The compiler needs to know the type of each variable so that it can reserve the correct amount of space on the stack.

Why don't constants created with #define need a type annotation? They are not stored on the stack. Instead, #define creates reusable snippets of source code in a slightly more maintainable manner than copy&paste. Literals in the source code such as "foo" or 42.87 are stored by the compiler either inline as special instructions, or in a separate data section of the resulting binary.

However, literals do have types. A string literal is a char *. 42 is an int but can also be used for shorter types (narrowing conversion). 42.8 would be a double. If you have a literal and want it to have a different type (e.g. to make 42.8 a float, or 42 an unsigned long int), then you can use suffixes – a letter after the literal that changes how the compiler treats that literal. In our case, we might say 42.8f or 42ul.

Some languages have static typing as in C, but the type annotations are optional. Examples are ML, Haskell, Scala, C#, C++11, and Go. How does that work? Magic? No, this is called “type inference”. In C# and Go, the compiler looks at the right hand side of an assignment, and deduces the type of that. This is fairly straightforward if the right hand side is a literal such as 42ul. Then it's obvious what the type of the variable should be. Other languages also have more complex algorithms that take into account how a variable is used. E.g. if you do x/2, then x can't be a string but must have some numeric type.

OTHER TIPS

X in the second example is never a float. It is called a macro, it replaces the defined macro value 'X' in the source with the value. A readable article on #define is here.

In the case of the supplied code, before compilation the preprocessor changes the code

Z=Y+X;

Z=Y+5.2;

and that is what gets compiled.

That means you can also replace those 'values' with code like

#define X sqrt(Y)

or even

#define X Y

The short answer is C needs types because of history / representing the hardware.

History: C was developed in the early 1970s and intended as a language for systems programming. Code is ideally fast and makes the best use of the capabilities of the hardware.

Inferring types at compile time would have been possible, but the already slow compile times would have increased (refer to XKCD's 'compiling' cartoon. This used to apply to 'hello world' for at least 10 years after C was published). Inferring types at runtime would not have fitted the aims of systems programming. Runtime inferrence requires additional run time library. C came long before the first PC. Which had 256 RAM. Not Gigabytes or Megabytes but Kilobytes.

In your example, if you omit the types

   X=5.2;
   Y=5.1;

   Z=Y+X;

Then the compiler could have happily worked out that X & Y are floats and made Z the same. In fact, a modern compiler would also work out that X & Y aren't needed and just set Z to 10.3.

Assume that the calculation is embedded inside a function. The function writer might want to use their knowledge of the hardware, or the problem being solved.

Would a double be more appropriate than a float ? Takes more memory and is slower but the accuracy of the result would be higher.

Maybe the return value of the function could be int (or long) because the decimals were not important, although conversion from float to int is not without cost.

The return value could also be made double guaranteeing that float + float does not overflow.

All of these questions seem pointless for the vast majority of code written today, but were vital when C was produced.

C doesn't have type inference (thats what it is called when a compiler guesses the type of a variable for you) because it is old. It was developed in the early 1970's

Many newer languages have systems that allow you to use variables without specifying their type (ruby, javascript, python, etc.)

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange