Why is declaration of data and functions necessary in C language, when the definition is written at the end of the source code?

Consider the following “C” code:

#include<stdio.h>
main()
{   
  printf("func:%d",Func_i());   
}

Func_i()
{
  int i=3;
  return i;
}

Func_i() is defined at the end of the source code and no declaration is provide before its use in main(). At the very time when the compiler sees Func_i() in main(), it comes out of the main() and finds out Func_i(). The compiler somehow finds the value returned by Func_i()and gives it to printf(). I also know that the compiler cannot find the return type of Func_i(). It, by default takes(guesses?) the return type of Func_i() to be int. That is if the code had float Func_i() then the compiler would give the error: Conflicting types for Func_i().

From the above discussion we see that:

The compiler can find the value returned by Func_i().
- If the compiler can find the value returned by Func_i() by coming out of the main() and searching down the source code, then why can’t it find the type of Func_i(), which is explicitly mentioned.
The compiler must know that Func_i() is of type float–that’s why it gives the error of conflicting types.

If the compiler knows that Func_i is of type float, then why does it still assume Func_i() to be of type int, and gives the error of conflicting types? Why don’t it forcefully make Func_i() to be of type float.

I’ve the same doubt with the variable declaration. Consider the following “C” code:

#include<stdio.h>
main()
{
  /* [extern int Data_i;]--omitted the declaration */
  printf("func:%d and Var:%d",Func_i(),Data_i);
}

 Func_i()
{
  int i=3;
  return i;
}
int Data_i=4;

The compiler gives the error: ‘Data_i’ undeclared(first use in this function).

When the compiler sees Func_i(), it goes down to the source code to find the value returned by Func_(). Why can’t the compiler do the same for the variable Data_i?

Edit:

I don’t know the details of the inner working of compiler, assembler, processor etc. The basic idea of my question is that if I tell(write) the return-value of the function in the source code at last, after the use of that function then the “C” language allows the computer to find that value without giving any error. Now why can’t the computer find the type similarly. Why can’t the type of Data_i be found as Func_i()’s return value was found. Even if I use the extern data-type identifier; statement, I am not telling the value to be returned by that identifier(function/variable). If the computer can find that value then why can’t it find the type. Why do we need the forward declaration at all?

Thank you.

Because C is a single-pass, statically-typed, weakly-typed, compiled language.

Single-pass means the compiler does not look ahead to see the definition of a function or variable. Since the compiler does not look ahead, the declaration of a function must come before the use of the function, otherwise the compiler does not know what its type signature is. However, the definition of the function can be later on in the same file, or even in a different file altogether. See point #4.

The only exception is the historical artifact that undeclared functions and variables are presumed to be of type “int”. Modern practice is to avoid implicit typing by always declaring functions and variables explicitly.
Statically-typed means that all type information is computed at compile time. That information is then used to generate machine code that executes at run time. There is no concept in C of run-time typing. Once an int, always an int, once a float, always a float. However, that fact is somewhat obscured by the next point.
Weakly-typed means that the C compiler automatically generates code to convert between numeric types without requiring the programmer to explicitly specify the conversion operations. Because of static typing, the same conversion will always be carried out in the same way each time through the program. If a float value is converted to an int value at a given spot in the code, a float value will always be converted to an int value at that spot in the code. This cannot be changed at run-time. The value itself may change from one execution of the program to the next, of course, and conditional statements may change which sections of code are run in what order, but a given single section of code without function calls or conditionals will always perform the exact same operations whenever it is run.
Compiled means that the process of analyzing the human-readable source code and transforming it into machine-readable instructions is fully carried out before the program runs. When the compiler is compiling a function, it has no knowledge of what it will encounter further down in a given source file. However, once compilation (and assembly, linking, etc) have completed, each function in the finished executable contains numeric pointers to the functions that it will call when it is run. That is why main() can call a function further down in the source file. By the time main() is actually run, it will contain a pointer to the address of Func_i().

Machine code is very, very specific. The code for adding two integers (3 + 2) is different from the one for adding two floats (3.0 + 2.0). Those are both different from adding an int to a float (3 + 2.0), and so on. The compiler determines for every point in a function what exact operation needs to be carried out at that point, and generates code that carries out that exact operation. Once that has been done, it cannot be changed without recompiling the function.

Putting all these concepts together, the reason that main() cannot “see” further down to determine the type of Func_i() is that type analysis occurs at the very beginning of the compilation process. At that point, only the part of the source file up to the definition of main() has been read and analyzed, and the definition of Func_i() is not yet known to the compiler.

The reason that main() can “see” where Func_i() is to call it is that calling happens at run time, after compilation has already resolved all of the names and types of all of the identifiers, assembly has already converted all of the functions to machine code, and linking has already inserted the correct address of each function in each place it is called.

I have, of course, left out most of the gory details. The actual process is much, much more complicated. I hope that I have provided enough of a high-level overview to answer your questions.

Additionally, please remember, what I have written above specifically applies to C.

In other languages, the compiler may make multiple passes through the source code, and so the compiler could pick up the definition of Func_i() without it being predeclared.

In other languages, functions and / or variables may be dynamically typed, so a single variable could hold, or a single function could be passed or return, an integer, a float, a string, an array, or an object at different times.

In other languages, typing may be stronger, requiring conversion from floating-point to integer to be explicitly specified. In yet other languages, typing may be weaker, allowing conversion from the string “3.0” to the float 3.0 to the integer 3 to be carried out automatically.

And in other languages, code may be interpreted one line at a time, or compiled to byte-code and then interpreted, or just-in-time compiled, or put through a wide variety of other execution schemes.

A design constraint of the C language was that it was supposed to be compiled by a single-pass compiler, which makes it suitable for very memory-constrained systems. Therefore, the compiler knows at any point only about stuff that was mentioned before. The compiler can’t skip forward in the source to find a function declaration and then go back to compile a call to that function. Therefore, all symbols ought to be declared before they are used. You can pre-declare a function like

int Func_i();

at the top or in a header file to help the compiler.

In your examples, you use two dubious features of the C language that should be avoided:

If a function is used before it was properly declared, this is used as an “implicit declaration”. The compiler uses the immediate context to figure out the function signature. The compiler will not scan through the rest of the code to figure out what the real declaration is.
If something is declared without a type, the type is taken to be int. This is e.g. the case for static variables or function return types.

So in printf("func:%d",Func_i()), we have an implicit declaration int Func_i(). When the compiler reaches the function definition Func_i() { ... }, this is compatible with the type. But if you wrote float Func_i() { ... } at this point, you have the implicity declared int Func_i() and the explicitly declared float Func_i(). Since the two declarations don’t match, the compiler gives you an error.

Clearing up some misconceptions

The compiler does not find the value returned by Func_i. The absence of an explicit type means that the return type is int by default. Even if you do this:
```
Func_i() {
    float f = 42.3;
    return f;
}
```
then the type will be int Func_i(), and the return value will be silently truncated!
The compiler eventually gets to know the real type of Func_i, but it does not know the real type during the implicit declaration. Only when it later reaches the real declaration can it find out whether the implicitly declared type was correct. But at that point, the assembly for the function call might already have been written and can’t be changed in the C compilation model.

First, you programs are valid for the C90 standard, but not for those following. implicit int (allowing to declare a function without giving its return type), and implicit declaration of functions (allowing to use a function without declaring it) are no more valid.

Second, that doesn’t work as you think.

Result type are optional in C90, not giving one means an int result. That it is also true for variable declaration (but you have to give a storage class, static or extern).
What the compiler does when seeing the Func_i is called without a previous declaration, is assuming that there is a declaration
```
extern int Func_i();
```
it doesn’t look further in the code to see how effectively Func_i is declared. If Func_i wasn’t declared or defined, the compiler would not change its behavior when compiling main. The implicit declaration is only for function, there is none for variable.

Note that the empty parameter list in the declaration doesn’t mean the function doesn’t take parameters (you need to specify (void) for that), it does mean that the compiler doesn’t have to check the types of the parameters and will the same implicit conversions that are applied to arguments passed to variadic functions.

You wrote in a comment:

The execution is done line-by-line. The only way to find the value returned by Func_i() is to jump out of the main

That’s a misconception: Execution isn’t don line-by-line. Compilation is done line by line, and name resolution is done during compilation, and it only resolves names, not return values.

A helpful conceptual model is this: When the compiler reads the line:

  printf("func:%d",Func_i());

it emits code equivalent to:

  1. call "function #2" and put the return value on the stack
  2. put the constant string "func:%d" on the stack
  3. call "function #1"

The compiler also makes a note in some internal table that function #2 is a not yet declared function named Func_i, that takes an unspecified number of arguments and returns an int (the default).

Later, when it parses this:

 int Func_i() { ...

the compiler looks up Func_i in the table mentioned above and checks if the parameters and the return type match. If they don’t, it stops with an error message. If they do, it adds the current address to the internal function table and goes on to the next line.

So, the compiler didn’t “look” for Func_i when it parsed the first reference. It simply made a note in some table, the went on parsing the next line. And at the end of the file, it has an object file, and a list of jump addresses.

Later, the linker takes all this, and replaces all pointers to “function #2” with the actual jump address, so it emits something like:

  call 0x0001215 and put the result on the stack
  put constant ... on the stack
  call ...
...
[at offset 0x0001215 in the file, compiled result of Func_i]:
  put 3 on the stack
  return top of the stack

Much later, when the executable file is run, the jump address is already resolved, and the computer can just jump to address 0x1215. No name lookup required.

Disclaimer: As I said, that’s a conceptual model, and the real world is more complicated. Compilers and linkers do all kinds of crazy optimizations today. They even might “jump up an down” to look for Func_i, although I doubt it. But the C languages is defined in a way that you could write a super-simple compiler like that. So most of the time, it’s a very useful model.

C and a number of other languages which require declarations were designed in an era when processor time and memory were expensive. The development of C and Unix went hand in hand for quite some time, and the latter didn’t have virtual memory until 3BSD appeared in 1979. Without the extra room to work, compilers tended to be single-pass affairs because they didn’t require the ability to keep some representation of the entire file in memory all at once.

Single-pass compilers are, like us, saddled with an inability to see into the future. This means the only things they can know for sure are what they’ve been told explicitly before the line of code being compiled. It’s plain to either of us that Func_i() is declared later in the source file, but the compiler, which operates on a small chunk of code at a time, has no clue it’s coming.

In early C (AT&T, K&R, C89), use of a function foo() before declaration resulted in a de facto or implicit declaration of int foo(). Your example works works when Func_i() is declared int because it matches what the compiler declared on your behalf. Changing it to any other type will result in a conflict because it no longer matches what the compiler chose in the absence of an explicit declaration. This behavior was removed in C99, where use of an undeclared function became an error.

So what about return types?

The calling convention for object code in most environments requires knowing only the address of the function being called, which is relatively easy for compilers and linkers to deal with. Execution jumps to the start of the function and comes back when it returns. Anything else, notably arrangements of passing arguments and a return value, is determined entirely by the caller and callee in an arrangement called a calling convention. As long as both share the same set of conventions, it becomes possible for a program to call functions in other object files whether they were compiled in any language that shares those conventions. (In scientific computing, you run into a lot of C calling FORTRAN and vice versa, and the ability to do that comes from having a calling convention.)

One other feature of early C was that prototypes as we know them now didn’t exist. You could declare a function’s return type (e.g., int foo()), but not its arguments (i.e., int foo(int bar) was not an option). This existed because, as outlined above, the program always stuck to a calling convention that could be determined by the arguments. If you called a function with the wrong type of arguments, it was a garbage in, garbage out situation.

Because object code has the notion of a return but not a return type, a compiler has to know the return type to deal with the value returned. When you’re running machine instructions, it’s all just bits and the processor doesn’t care whether the memory where you’re trying to compare a double actually has an int in it. It just does what you ask, and if you break it, you own both pieces.

Consider these bits of code:

double foo();         double foo();
double x;             int x;
x = foo();            x = foo();

The code on the left compiles down to a call to foo() followed by copying the result provided via the call/return convention into wherever x is stored. That’s the easy case.

The code on the right shows a type conversion and is why compilers need to know a function’s return type. Floating-point numbers can’t be dumped into memory where other code will expect to see an int because there’s no magic conversion that takes place. If the end result has to be an integer, there have to be instructions that guide the processor to make the conversion before storage. Without knowing the return type of foo() ahead of time, the compiler would have no idea that conversion code is necessary.

Multi-pass compilers enable all sorts of things, one of which is the ability to declare variables, functions and methods after they’re first used. This means that when the compiler gets around to compiling the code, it has already seen the future and knows what to do. Java, for example, mandates multi-pass by virtue of the fact that its syntax allows declaration after use.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 20:35

Thẻ: c++, declarations, functions, language-design, variables

Why is declaration of data and functions necessary in C language, when the definition is written at the end of the source code?

Consider the following “C” code:

#include<stdio.h>
main()
{   
  printf("func:%d",Func_i());   
}

Func_i()
{
  int i=3;
  return i;
}

From the above discussion we see that:

The compiler can find the value returned by Func_i().
- If the compiler can find the value returned by Func_i() by coming out of the main() and searching down the source code, then why can’t it find the type of Func_i(), which is explicitly mentioned.
The compiler must know that Func_i() is of type float–that’s why it gives the error of conflicting types.

If the compiler knows that Func_i is of type float, then why does it still assume Func_i() to be of type int, and gives the error of conflicting types? Why don’t it forcefully make Func_i() to be of type float.

I’ve the same doubt with the variable declaration. Consider the following “C” code:

#include<stdio.h>
main()
{
  /* [extern int Data_i;]--omitted the declaration */
  printf("func:%d and Var:%d",Func_i(),Data_i);
}

 Func_i()
{
  int i=3;
  return i;
}
int Data_i=4;

The compiler gives the error: ‘Data_i’ undeclared(first use in this function).

When the compiler sees Func_i(), it goes down to the source code to find the value returned by Func_(). Why can’t the compiler do the same for the variable Data_i?

Edit:

Thank you.

Because C is a single-pass, statically-typed, weakly-typed, compiled language.

Single-pass means the compiler does not look ahead to see the definition of a function or variable. Since the compiler does not look ahead, the declaration of a function must come before the use of the function, otherwise the compiler does not know what its type signature is. However, the definition of the function can be later on in the same file, or even in a different file altogether. See point #4.

The only exception is the historical artifact that undeclared functions and variables are presumed to be of type “int”. Modern practice is to avoid implicit typing by always declaring functions and variables explicitly.
Statically-typed means that all type information is computed at compile time. That information is then used to generate machine code that executes at run time. There is no concept in C of run-time typing. Once an int, always an int, once a float, always a float. However, that fact is somewhat obscured by the next point.
Weakly-typed means that the C compiler automatically generates code to convert between numeric types without requiring the programmer to explicitly specify the conversion operations. Because of static typing, the same conversion will always be carried out in the same way each time through the program. If a float value is converted to an int value at a given spot in the code, a float value will always be converted to an int value at that spot in the code. This cannot be changed at run-time. The value itself may change from one execution of the program to the next, of course, and conditional statements may change which sections of code are run in what order, but a given single section of code without function calls or conditionals will always perform the exact same operations whenever it is run.
Compiled means that the process of analyzing the human-readable source code and transforming it into machine-readable instructions is fully carried out before the program runs. When the compiler is compiling a function, it has no knowledge of what it will encounter further down in a given source file. However, once compilation (and assembly, linking, etc) have completed, each function in the finished executable contains numeric pointers to the functions that it will call when it is run. That is why main() can call a function further down in the source file. By the time main() is actually run, it will contain a pointer to the address of Func_i().

Machine code is very, very specific. The code for adding two integers (3 + 2) is different from the one for adding two floats (3.0 + 2.0). Those are both different from adding an int to a float (3 + 2.0), and so on. The compiler determines for every point in a function what exact operation needs to be carried out at that point, and generates code that carries out that exact operation. Once that has been done, it cannot be changed without recompiling the function.

I have, of course, left out most of the gory details. The actual process is much, much more complicated. I hope that I have provided enough of a high-level overview to answer your questions.

Additionally, please remember, what I have written above specifically applies to C.

In other languages, the compiler may make multiple passes through the source code, and so the compiler could pick up the definition of Func_i() without it being predeclared.

And in other languages, code may be interpreted one line at a time, or compiled to byte-code and then interpreted, or just-in-time compiled, or put through a wide variety of other execution schemes.

int Func_i();

at the top or in a header file to help the compiler.

In your examples, you use two dubious features of the C language that should be avoided:

If a function is used before it was properly declared, this is used as an “implicit declaration”. The compiler uses the immediate context to figure out the function signature. The compiler will not scan through the rest of the code to figure out what the real declaration is.
If something is declared without a type, the type is taken to be int. This is e.g. the case for static variables or function return types.

Clearing up some misconceptions

The compiler does not find the value returned by Func_i. The absence of an explicit type means that the return type is int by default. Even if you do this:
```
Func_i() {
    float f = 42.3;
    return f;
}
```
then the type will be int Func_i(), and the return value will be silently truncated!
The compiler eventually gets to know the real type of Func_i, but it does not know the real type during the implicit declaration. Only when it later reaches the real declaration can it find out whether the implicitly declared type was correct. But at that point, the assembly for the function call might already have been written and can’t be changed in the C compilation model.

Second, that doesn’t work as you think.

Result type are optional in C90, not giving one means an int result. That it is also true for variable declaration (but you have to give a storage class, static or extern).
What the compiler does when seeing the Func_i is called without a previous declaration, is assuming that there is a declaration
```
extern int Func_i();
```
it doesn’t look further in the code to see how effectively Func_i is declared. If Func_i wasn’t declared or defined, the compiler would not change its behavior when compiling main. The implicit declaration is only for function, there is none for variable.

Note that the empty parameter list in the declaration doesn’t mean the function doesn’t take parameters (you need to specify (void) for that), it does mean that the compiler doesn’t have to check the types of the parameters and will the same implicit conversions that are applied to arguments passed to variadic functions.

You wrote in a comment:

The execution is done line-by-line. The only way to find the value returned by Func_i() is to jump out of the main

That’s a misconception: Execution isn’t don line-by-line. Compilation is done line by line, and name resolution is done during compilation, and it only resolves names, not return values.

A helpful conceptual model is this: When the compiler reads the line:

  printf("func:%d",Func_i());

it emits code equivalent to:

  1. call "function #2" and put the return value on the stack
  2. put the constant string "func:%d" on the stack
  3. call "function #1"

The compiler also makes a note in some internal table that function #2 is a not yet declared function named Func_i, that takes an unspecified number of arguments and returns an int (the default).

Later, when it parses this:

 int Func_i() { ...

Later, the linker takes all this, and replaces all pointers to “function #2” with the actual jump address, so it emits something like:

  call 0x0001215 and put the result on the stack
  put constant ... on the stack
  call ...
...
[at offset 0x0001215 in the file, compiled result of Func_i]:
  put 3 on the stack
  return top of the stack

Much later, when the executable file is run, the jump address is already resolved, and the computer can just jump to address 0x1215. No name lookup required.

So what about return types?

Consider these bits of code:

double foo();         double foo();
double x;             int x;
x = foo();            x = foo();

The code on the left compiles down to a call to foo() followed by copying the result provided via the call/return convention into wherever x is stored. That’s the easy case.

Filed under: softwareengineering - @ 20:35

Thẻ: c++, declarations, functions, language-design, variables

Thiết kế website giá rẻ

Danh mục

Why is declaration of data and functions necessary in C language, when the definition is written at the end of the source code?

Clearing up some misconceptions

Why is declaration of data and functions necessary in C language, when the definition is written at the end of the source code?

Clearing up some misconceptions