Skip to content

Section 2: Compiling and Running C Programs

This discussion section serves as gentle introduction to the basics of compiling and running C programs on the ecelinux machines.

1. Logging Into ecelinux with VS Code

As we learned in the last discussion section, we will be using the ecelinux servers for all of the programming assignments. In the last discussion section we used PowerShell to log into the ecelinux servers. While PowerShell (perhaps in combination with using Micro) is perfectly for basic work at the Linux command line, it is not a productive way to develop large and complicated software engineering projects.

In this discussion section, we will use VS Code to log into the ecelinux servers which is the recommended remote access option. VS Code provides a nice GUI for navigating the directory hierarchy on ecelinux, great syntax highlighting for C/C++ programs, the ability to open many files at once using tabs, and an integrated remote terminal for running commands at the Linux command line. When using VS Code it is important to keep in mind that the GUI interface runs completely on the local workstation and then automatically handles copying files back and forth between the local workstation and the ecelinux servers.

Note, if you have already installed VS Code on your laptop, then you should feel free to use your laptop for this discussion section. However, if you have not already installed VS Code on your laptop and verified it works, then please use the workstations in 225 Upson. We do not have time to help you setup VS Code on your own laptop in the discussion section.

1.1. Logging into ecelinux Servers with VS Code

To start VS Code click the Start menu then choose VS Code > VS Code, or click the Start menu, type VS Code, and choose VS Code.

Now we need to log into the ecelinux servers. Choose View > Command Palette from the menubar. This will cause a little "command palette" to drop down where you can enter commands to control VS Code. Enter the following command in the command palette:

1
Remote-SSH: Connect Current Window to Host...

As you start typing matching commands will be displayed and you can just click the command when you see it. VS Code will then ask you to Enter SSH Connection Command, and you should enter the following:

1
netid@ecelinux.ece.cornell.edu

Replace netid with your Cornell NetID in the command above.

You may see a pop-up which stays that the Windows Defender Firewall as blocked some features of this app. This is not a problem. Simply click Cancel. You might also see a drop down which asks you to choose the operating system of the remote server with options like Linux and Windows. Choose Linux. Finally, the very first time you log into the ecelinux servers you may see a warning like this:

1
2
3
4
5
"ecelinux.ece.cornell.edu" has fingerprint
"SHA256:smwMnf9dyhs5zW5I279C5oJBrTFc5FLghIJMfBR1cxI".
Are you sure you want to continue?
Continue
Cancel

Also the very first time you log into the ecelinux servers you will see a pop up dialog box in the lower right-hand corner which says Setting up SSH host ecelinux.ece.cornell.edu (details) Initializing.... It might take up to a minute for everything to be setup; please be patient! Once the pop up dialog box goes away and you see SSH: ecelinux.ece.cornell.edu in green in the lower left-hand corner of VS Code then you know you are connected to the ecelinux servers.

The final step is to make sure your extensions for C/C++ are also installed on the server. Choose View > Command Palette from the menubar. Search for the same C/C++ extensions we installed earlier. When you find these extensions instead of saying Install it should now say Install in SSH: ecelinux.ece.cornell.edu. Install the C/C++ language extension on the ecelinux servers. You only need to do this once, and then next time this extension will already be installed on the ecelinux servers.

1.2. Using VS Code

VS Code includes an integrated file explorer which makes it very productive to browse and open files. Choose View > Explorer from the menubar, and then click on Open Folder. VS Code will then ask you to Open File Or Folder with a default of /home/netid. Click OK.

You might see a pop-up which asks you Do you trust the authors of the files in this folder? Since you will only be browsing your own files on the ecelinux server, it is fine to choose Yes, I trust the authors.

This will reload VS Code, and you should now you will see a file explore in the left sidebar. You can easily browse your directory hierarchy, open files by clicking on them, create new files, and delete files.

VS Code includes an integrated terminal which will give you access to the Linux command line on the ecelinux servers. Choose Terminal > New Terminal from the menubar. You should see the same kind of Linux command line prompt that you saw when using either PowerShell or Mac Terminal. The very first thing you need to do after logging into the ecelinux servers is source the course setup script. This will ensure your environment is setup with everything you need for working on the programming assignments. Enter the following command on the command line:

1
% source setup-ece2400.sh

Note that you do not need to enter % character. In a tutorial like this, the % simply indicates what you should type at the command line. You should now see ECE 2400 in your prompt which means your environment is setup for the course.

If you used --enable-auto-setup in the last discussion section, then the setup script is already sourced for you automatically when you log into the ecelinux servers.

To experiment with VS Code, we will first grab a text file using the wget command you learned about in the last discussion section. Enter the following command on the command line:

1
% wget http://www.csl.cornell.edu/courses/ece2400/overview.txt

You can now open a file in the integrated text editor using the code command like this:

1
% code overview.txt

Notice how the overview.txt file opened in a new tab at the top and the terminal remains at the bottom. This enables you to have easy access to editing files and the Linux command line at the same time.

1.3. Final Setup

Now clone the GitHub repo we will be using in this discussion section using the following commands:

1
2
3
4
5
6
% source setup-ece2400.sh
% mkdir -p ${HOME}/ece2400
% cd ${HOME}/ece2400
% git clone git@github.com:cornell-ece2400/ece2400-sec02 sec02
% cd sec02
% cat README.md

2. Compiling and Running a Single-File C Program

We will begin by writing a single-file C program to calculate the average of two integers similar to what we have studied in lecture. We have provided you with a template in the avg-main.c file. Edit the avg-main.c file to include an appropriate implementation of the avg function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#include <stdio.h>

int avg( int x, int y )
{
  int sum = x + y;
  return sum / 2;
}

int main()
{
  int a = 10;
  int b = 20;
  int c = avg( a, b );
  printf( "average of %d and %d is %d\n", a, b, c );
  return  0;
}

We use a compiler to compiler the C source code into an executable binary (i.e., the actual bits) that the machine can understand. In this course we will be using the GNU C compiler (gcc). Let's go ahead and give this a try:

1
2
3
% cd ${HOME}/ece2400/sec02
% gcc -Wall -o avg-main avg-main.c
% ls

The gcc command takes as input the C source file to compile and the command line option -o is used to specify the output exectutable binary (i.e., the file with the machine instructions). We also use the -Wall command line option to report all warnings. After running the gcc command you should see a new avg-main file in the directory. We can execute this binary by simply calling it as any other Linux command.

1
2
% cd ${HOME}/ece2400/sec02
% ./avg-main

Recall that a single dot (.) always refers to the current working directory. Essentially we are telling Linux that we want to run the executable binary named avg-main which is located in the current working directory. Repl.it is basically doing these same steps just in the cloud.

It can be tedious to to have to carefully enter the correct commands on the command line every time we want to compile a C source file into an executable binary. In the next discussion section, we will explore using a build framework to automate the process of building our C programs. The process of executing the avg-main executable and verifying its output is called ad-hoc testing. It is ad-hoc because there is no systematic and automatic way to run and verify tests. In the next discussion section, we will explore using a test framework to automate the process of testing our C programs.

Now let's examine the machine instructions using the objdump command.

1
2
% cd ${HOME}/ece2400/sec02
% objdump -dC avg-main | less

The objdump command takes an executable binary and shows you the machine instructions in a human readable format. We are piping it through less so we can scroll through the output. Try and find how many machine instructions are used to implement the avg function. Does it seem like the compiler generated optimized code or unoptimized code? You can exit less by pressing the q key. Let's recompile our program with optimizations.

1
2
3
% cd ${HOME}/ece2400/sec02
% gcc -Wall -O3 -o avg-main avg-main.c
% objdump -dC avg-main | less

Now how many machine instructions are used to implement the avg function?

3. Compiling and Running a Multi-File C Program

Real C programs are almost never contained in a single file. They require many files which must be individually compiled and then linked together. Linking is the process of merging together different binary files each with its own set of machine instructions. To illustrate this process we will experiment with a function to square a given parameter. Our project will include three files:

  • square.h: header file with function prototype for square function
  • square.c: source file with function definition for square function
  • square-adhoc.c: adhoc test of square function which contains main

We will compile the square.c and square-adhoc.c files into their own object files and then link these object files into a complete executable binary. Here is a figure illustrating the compiler and linker flow.

compiler flow

An object file is like a chunk of machine instructions. We cannot execute an object file directly. We can only link object files to create an executable binary.

Start by creating a header file named square.h. Header files are the key to multi-file C programs. The square-adhoc.c source file needs to call the square function, but the square function is in a different source file. When we compile the square-adhoc.c source file, how will the compiler know that the square function exists to ensure the programmer is not accidentally calling an undefined function? How will the compiler know what parameters the square function takes, so it can perform type checking? The square-adhoc.c source file cannot directly include square.c since that would result in the same function being compiled twice into two different object files (which would cause a linker error). What we need to do is have a way to tell square-adhoc.c the square function prototype (i.e., the interface of the function including its name, parameter list, and return type) but not the square function definition. We do this with a function declaration. A function definition specifies both the function prototype (interface) and the implementation at the same time, while a function declaration just specifies the function prototype without the implementation. A header file contains all of the function declarations but no function definitions. All of the function definitions are placed in a source file that goes along with the header file. If we want to call a function that is defined in a different source file, then we simply use the #include directive to include the appropriate header file. The linker will take care of making sure the machine instructions corresponding to every function definition are linked together into the executable binary.

We have provided you the square.h file with the the following contents.

1
int square( int x );

We have provided you with a template for the square.c file. Edit the square.c file to include an appropriate implementation of the square function.

1
2
3
4
5
6
#include "square.h"

int square( int x )
{
  return x * x;
}

Notice how our square.c file includes the corresponding square.h file. This is best practice which follows the course coding conventions. Finally, take a look at the provided square-adhoc.c file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#include "square.h"
#include <stdio.h>

int main()
{
  int a = 10;
  int b = square( a );
  printf( "square of %d is %d\n", a, b );
  return 0;
}

Let's go ahead and compile square.c and square-adhoc.c into their corresponding object files:

1
2
3
% cd ${HOME}/ece2400/sec02
% gcc -Wall -c -o square.o square.c
% gcc -Wall -c -o square-adhoc.o square-adhoc.c

We use the -c command line option to indicate that gcc should create an object file as opposed to a complete executable binary. An object file is just a piece of machine instructions. Again, we cannot actually execute an object file; we need to link multiple object files together to create a complete executable binary. We usually use the .o filename extension to indicate that these files are object files. Let's use objdump to look inside each of these object files.

1
2
3
% cd ${HOME}/ece2400/sec02
% objdump -dC square.o
% objdump -dC square-adhoc.o

You should be able to see that each object file only contains a few machine instructions. The square.o object file only contains machine instructions that correspond to the square function, while the square-adhoc.o object file only contains machine instructions that correspond to the main function.

Let's link these two object files together to create a complete executable binary that we can actually run.

1
2
3
% cd ${HOME}/ece2400/sec02
% gcc -Wall -o square-adhoc square.o square-adhoc.o
% objdump -dC square-adhoc

Notice that the complete executable binary contains all of the machine instructions for both the square and main functions along with a bunch of additional system-level code (e.g., for the printf function). Let's go ahead and run the executable binary.

1
2
% cd ${HOME}/ece2400/sec02
% ./square-adhoc

We can simplify this process and do the compilation and linking in a single step by specifying multiple C source files on a single command line.

1
2
3
% cd ${HOME}/ece2400/sec02
% gcc -Wall -o square-adhoc square.c square-adhoc.c
% ./square-adhoc

This of course begs the question. If we can compile a project with multiple files simply by specifying all of the files on the command line, then why did we learn about how to: (1) compile each file individually into an object file; and (2) link these object files together? For small projects with just 2-3 files there is no need to use object files. However, in a project with thousands of files, specifying all files on a single command line will cause each recompilation to take a very long fixed amount of time (e.g., many minutes). Even if we make a very small change to a single source file we will have recompile every source file!

Using object files enables modular compilation. In modular compilation, we only need to recompile those source files what have changed. We can simply reuse the previously compiled object files for those source files that have not changed. Modulary compilation can drastically reduce recompile times so that it is proportional to just how many changes you have made to the source files (e.g. less than a second). One challenge with modular compilation is it drastically increases the build complexity. There are many more commands to enter on the command line, and we need to carefully rack which commands need to be redone whenever we change a C source file. In the next discussion section, we will explore using a build framework to automate the process of modular compilation for complex C programs.

4. The C Preprocessor

So far we have glossed over what exactly the #include directive actually does. This directive is not part of the C programming language but is instead part of the C Preprocessor which is yet another step in the compilation flow. The preprocessor takes an input C source file, preprocesses it, and generates the preprocessed version of the C source file. It is important to realize that the C preprocessor is not really part of the C programming language. The C preprocessor simply manipulates the plain text in the C source files and knows nothing about the C programming language's syntax or semantics. The C preprocessor is powerful but also very easy to abuse. Using the C preprocessor can cause subtle bugs and is usually not necessary. Unfortunately, there are a few cases where we have no choice but to use the C preprocessor. gcc takes care of automatically running the C preprocessor for us. Here is a more complete look at the compilation flow for our multi-file C program.

compiler flow

You can see the output of the C preprocessor by using the -E command line option to gcc. Try the following:

1
2
3
% cd ${HOME}/ece2400/sec02
% gcc -Wall -E -o square.i square.c
% cat square.i

You should see something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 1 "square.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "square.c"
# 1 "square.h" 1
int square( int x );
# 2 "square.c" 2

int square( int x )
{
  return x * x;
}

You will see some lines that start with # which are comments and then you see the contents of the square.h file included into the resulting preprocessed file. All the #include directive does is simply take the contents of the given file and include them verbatim.

In our programming assignments, we will use a more sophisticated coding convention for our header files that looks like this:

1
2
3
4
5
6
#ifndef SEC02_SQUARE_H
#define SEC02_SQUARE_H

int square( int x );

#endif

The #ifndef, #define, and #endif preprocessor directives implement what is called an include guard which prevents the contents of a header file from being included multiple times. If the contents of a header file are accidently included multiple times, the compiler will process its contents twice and this will likely cause an error.

5. Compiling and Running C Programs for PA1

Let's experiment with compiling an ad-hoc test for the first programming assignment using what we have learned in this discussion section. First, you need to make sure you have accepted the invitation to join the cornell-ece2400 GitHub organization. Go to this link:

and sign-in to GitHub. If you have not accepted the invitation yet, you will see a page with a link to Join ECE 2400 Computer Systems Programming. If you have already accepted the invitation (probably by clicking a link in an automated email from GitHub), then you will see the cornell-ece2400 GitHub organization. Confirm you can see a repository with your NetID.

You can now use the following steps to clone your PA repo.

1
2
3
4
5
% mkdir -p ${HOME}/ece2400
% cd ${HOME}/ece2400
% git clone git@github.com:cornell-ece2400/netid
% cd netid
% tree

where netid is your NetID. Recall that ad-hoc testing involves compiling a program manually from command line, and using that program to print out the result of your function. Then you can verify that the results are as expected. We have included an ad-hoc test for each implementation in your repo that you can use for early experimentation.

1
2
3
% cd ${HOME}/ece2400/netid/pa1-math/src
% gcc -Wall -o sqrt-iter-adhoc ece2400-stdlib.c sqrt-iter.c sqrt-iter-adhoc.c
% ./sqrt-iter-adhoc

These ad-hoc tests will not print out the correct value because you haven't completed the programming assignment yet, but this at least illustrates how we can used what we have learned in this discussion section to compile an ad-hoc test from the command line.

6. To-Do On Your Own

If you have time, create a new source file named avg3-main.c in the ${HOME}/ece2400/sec02 directory that contains an avg3 function. This function should calculate the average of three values instead of just two. Modify the main function to properly call your updated function. Compile your new program and run it to verify it calculates the average correctly.