Section 2: Compiling and Running C Programs¶
This discussion section serves as gentle introduction to the basics of
compiling and running C programs on the ecelinux
machines.
1. Logging Into ecelinux with VS Code¶
As we learned in the last discussion section, we will be using the
ecelinux
servers for all of the programming assignments. In the last
discussion section we used PowerShell to log into the ecelinux
servers.
While PowerShell (perhaps in combination with using Micro) is perfectly
for basic work at the Linux command line, it is not a productive way to
develop large and complicated software engineering projects.
In this discussion section, we will use VS Code to log into the
ecelinux
servers which is the recommended remote access option. VS Code
provides a nice GUI for navigating the directory hierarchy on ecelinux
,
great syntax highlighting for C/C++ programs, the ability to open many
files at once using tabs, and an integrated remote terminal for running
commands at the Linux command line. When using VS Code it is important to
keep in mind that the GUI interface runs completely on the local
workstation and then automatically handles copying files back and forth
between the local workstation and the ecelinux
servers.
Note, if you have already installed VS Code on your laptop, then you should feel free to use your laptop for this discussion section. However, if you have not already installed VS Code on your laptop and verified it works, then please use the workstations in 225 Upson. We do not have time to help you setup VS Code on your own laptop in the discussion section.
1.1. Logging into ecelinux
Servers with VS Code¶
To start VS Code click the Start menu then choose VS Code > VS Code, or click the Start menu, type VS Code, and choose VS Code.
Now we need to log into the ecelinux
servers. Choose View > Command
Palette from the menubar. This will cause a little "command palette" to
drop down where you can enter commands to control VS Code. Enter the
following command in the command palette:
1 | Remote-SSH: Connect Current Window to Host... |
As you start typing matching commands will be displayed and you can just click the command when you see it. VS Code will then ask you to Enter SSH Connection Command, and you should enter the following:
1 | netid@ecelinux.ece.cornell.edu |
Replace netid
with your Cornell NetID in the command above.
You may see a pop-up which stays that the Windows Defender Firewall as
blocked some features of this app. This is not a problem. Simply click
Cancel. You might also see a drop down which asks you to choose the
operating system of the remote server with options like Linux and
Windows. Choose Linux. Finally, the very first time you log into the
ecelinux
servers you may see a warning like this:
1 2 3 4 5 | "ecelinux.ece.cornell.edu" has fingerprint "SHA256:smwMnf9dyhs5zW5I279C5oJBrTFc5FLghIJMfBR1cxI". Are you sure you want to continue? Continue Cancel |
Also the very first time you log into the ecelinux
servers you will see
a pop up dialog box in the lower right-hand corner which says Setting up
SSH host ecelinux.ece.cornell.edu (details) Initializing.... It might
take up to a minute for everything to be setup; please be patient! Once
the pop up dialog box goes away and you see SSH:
ecelinux.ece.cornell.edu in green in the lower left-hand corner of VS
Code then you know you are connected to the ecelinux servers.
The final step is to make sure your extensions for C/C++ are also
installed on the server. Choose View > Command Palette from the
menubar. Search for the same C/C++ extensions we installed earlier. When
you find these extensions instead of saying Install it should now say
Install in SSH: ecelinux.ece.cornell.edu. Install the C/C++ language
extension on the ecelinux
servers. You only need to do this once, and
then next time this extension will already be installed on the ecelinux
servers.
1.2. Using VS Code¶
VS Code includes an integrated file explorer which makes it very
productive to browse and open files. Choose View > Explorer from the
menubar, and then click on Open Folder. VS Code will then ask you to
Open File Or Folder with a default of /home/netid
. Click OK.
You might see a pop-up which asks you Do you trust the authors of the
files in this folder? Since you will only be browsing your own files on
the ecelinux
server, it is fine to choose Yes, I trust the authors.
This will reload VS Code, and you should now you will see a file explore in the left sidebar. You can easily browse your directory hierarchy, open files by clicking on them, create new files, and delete files.
VS Code includes an integrated terminal which will give you access to the
Linux command line on the ecelinux
servers. Choose Terminal > New
Terminal from the menubar. You should see the same kind of Linux command
line prompt that you saw when using either PowerShell or Mac Terminal.
The very first thing you need to do after logging into the ecelinux
servers is source the course setup script. This will ensure your
environment is setup with everything you need for working on the
programming assignments. Enter the following command on the command line:
1 | % source setup-ece2400.sh
|
Note that you do not need to enter %
character. In a tutorial like
this, the %
simply indicates what you should type at the command line.
You should now see ECE 2400
in your prompt which means your environment
is setup for the course.
If you used --enable-auto-setup
in the last discussion section, then
the setup script is already sourced for you automatically when you log
into the ecelinux
servers.
To experiment with VS Code, we will first grab a text file using the
wget
command you learned about in the last discussion section. Enter
the following command on the command line:
1 | % wget http://www.csl.cornell.edu/courses/ece2400/overview.txt |
You can now open a file in the integrated text editor using the code command like this:
1 | % code overview.txt |
Notice how the overview.txt
file opened in a new tab at the top and the
terminal remains at the bottom. This enables you to have easy access to
editing files and the Linux command line at the same time.
1.3. Final Setup¶
Now clone the GitHub repo we will be using in this discussion section using the following commands:
1 2 3 4 5 6 | % source setup-ece2400.sh % mkdir -p ${HOME}/ece2400 % cd ${HOME}/ece2400 % git clone git@github.com:cornell-ece2400/ece2400-sec02 sec02 % cd sec02 % cat README.md |
2. Compiling and Running a Single-File C Program¶
We will begin by writing a single-file C program to calculate the average
of two integers similar to what we have studied in lecture. We have
provided you with a template in the avg-main.c
file. Edit the
avg-main.c
file to include an appropriate implementation of the avg
function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <stdio.h> int avg( int x, int y ) { int sum = x + y; return sum / 2; } int main() { int a = 10; int b = 20; int c = avg( a, b ); printf( "average of %d and %d is %d\n", a, b, c ); return 0; } |
We use a compiler to compiler the C source code into an executable
binary (i.e., the actual bits) that the machine can understand. In this
course we will be using the GNU C compiler (gcc
). Let's go ahead and
give this a try:
1 2 3 | % cd ${HOME}/ece2400/sec02 % gcc -Wall -o avg-main avg-main.c % ls |
The gcc
command takes as input the C source file to compile and the
command line option -o
is used to specify the output exectutable binary
(i.e., the file with the machine instructions). We also use the -Wall
command line option to report all warnings. After running the gcc
command you should see a new avg-main
file in the directory. We can
execute this binary by simply calling it as any other Linux command.
1 2 | % cd ${HOME}/ece2400/sec02 % ./avg-main |
Recall that a single dot (.
) always refers to the current working
directory. Essentially we are telling Linux that we want to run the
executable binary named avg-main
which is located in the current
working directory. Repl.it is basically doing these same steps just in
the cloud.
It can be tedious to to have to carefully enter the correct commands on
the command line every time we want to compile a C source file into an
executable binary. In the next discussion section, we will explore using
a build framework to automate the process of building our C programs.
The process of executing the avg-main
executable and verifying its
output is called ad-hoc testing. It is ad-hoc because there is no
systematic and automatic way to run and verify tests. In the next
discussion section, we will explore using a test framework to automate
the process of testing our C programs.
Now let's examine the machine instructions using the objdump
command.
1 2 | % cd ${HOME}/ece2400/sec02 % objdump -dC avg-main | less |
The objdump
command takes an executable binary and shows you the
machine instructions in a human readable format. We are piping it through
less
so we can scroll through the output. Try and find how many machine
instructions are used to implement the avg
function. Does it seem like
the compiler generated optimized code or unoptimized code? You can exit
less
by pressing the q
key. Let's recompile our program with
optimizations.
1 2 3 | % cd ${HOME}/ece2400/sec02 % gcc -Wall -O3 -o avg-main avg-main.c % objdump -dC avg-main | less |
Now how many machine instructions are used to implement the avg
function?
3. Compiling and Running a Multi-File C Program¶
Real C programs are almost never contained in a single file. They require many files which must be individually compiled and then linked together. Linking is the process of merging together different binary files each with its own set of machine instructions. To illustrate this process we will experiment with a function to square a given parameter. Our project will include three files:
square.h
: header file with function prototype forsquare
functionsquare.c
: source file with function definition forsquare
functionsquare-adhoc.c
: adhoc test ofsquare
function which containsmain
We will compile the square.c
and square-adhoc.c
files into their own
object files and then link these object files into a complete executable
binary. Here is a figure illustrating the compiler and linker flow.
An object file is like a chunk of machine instructions. We cannot execute an object file directly. We can only link object files to create an executable binary.
Start by creating a header file named square.h
. Header files are the
key to multi-file C programs. The square-adhoc.c
source file needs to
call the square
function, but the square
function is in a different
source file. When we compile the square-adhoc.c
source file, how will
the compiler know that the square
function exists to ensure the
programmer is not accidentally calling an undefined function? How will
the compiler know what parameters the square
function takes, so it can
perform type checking? The square-adhoc.c
source file cannot directly
include square.c
since that would result in the same function being
compiled twice into two different object files (which would cause a
linker error). What we need to do is have a way to tell square-adhoc.c
the square
function prototype (i.e., the interface of the function
including its name, parameter list, and return type) but not the square
function definition. We do this with a function declaration
. A
function definition specifies both the function prototype (interface) and
the implementation at the same time, while a function declaration just
specifies the function prototype without the implementation. A header
file contains all of the function declarations but no function
definitions. All of the function definitions are placed in a source file
that goes along with the header file. If we want to call a function that
is defined in a different source file, then we simply use the #include
directive to include the appropriate header file. The linker will take
care of making sure the machine instructions corresponding to every
function definition are linked together into the executable binary.
We have provided you the square.h
file with the the following contents.
1 | int square( int x ); |
We have provided you with a template for the square.c
file. Edit the
square.c
file to include an appropriate implementation of the square
function.
1 2 3 4 5 6 | #include "square.h" int square( int x ) { return x * x; } |
Notice how our square.c
file includes the corresponding square.h
file. This is best practice which follows the course coding conventions.
Finally, take a look at the provided square-adhoc.c
file:
1 2 3 4 5 6 7 8 9 10 | #include "square.h" #include <stdio.h> int main() { int a = 10; int b = square( a ); printf( "square of %d is %d\n", a, b ); return 0; } |
Let's go ahead and compile square.c
and square-adhoc.c
into their
corresponding object files:
1 2 3 | % cd ${HOME}/ece2400/sec02 % gcc -Wall -c -o square.o square.c % gcc -Wall -c -o square-adhoc.o square-adhoc.c |
We use the -c
command line option to indicate that gcc
should create
an object file as opposed to a complete executable binary. An object
file is just a piece of machine instructions. Again, we cannot actually
execute an object file; we need to link multiple object files together to
create a complete executable binary. We usually use the .o
filename
extension to indicate that these files are object files. Let's use
objdump
to look inside each of these object files.
1 2 3 | % cd ${HOME}/ece2400/sec02 % objdump -dC square.o % objdump -dC square-adhoc.o |
You should be able to see that each object file only contains a few
machine instructions. The square.o
object file only contains machine
instructions that correspond to the square
function, while the
square-adhoc.o
object file only contains machine instructions that
correspond to the main
function.
Let's link these two object files together to create a complete executable binary that we can actually run.
1 2 3 | % cd ${HOME}/ece2400/sec02 % gcc -Wall -o square-adhoc square.o square-adhoc.o % objdump -dC square-adhoc |
Notice that the complete executable binary contains all of the machine
instructions for both the square
and main
functions along with a
bunch of additional system-level code (e.g., for the printf
function).
Let's go ahead and run the executable binary.
1 2 | % cd ${HOME}/ece2400/sec02 % ./square-adhoc |
We can simplify this process and do the compilation and linking in a single step by specifying multiple C source files on a single command line.
1 2 3 | % cd ${HOME}/ece2400/sec02 % gcc -Wall -o square-adhoc square.c square-adhoc.c % ./square-adhoc |
This of course begs the question. If we can compile a project with multiple files simply by specifying all of the files on the command line, then why did we learn about how to: (1) compile each file individually into an object file; and (2) link these object files together? For small projects with just 2-3 files there is no need to use object files. However, in a project with thousands of files, specifying all files on a single command line will cause each recompilation to take a very long fixed amount of time (e.g., many minutes). Even if we make a very small change to a single source file we will have recompile every source file!
Using object files enables modular compilation. In modular compilation, we only need to recompile those source files what have changed. We can simply reuse the previously compiled object files for those source files that have not changed. Modulary compilation can drastically reduce recompile times so that it is proportional to just how many changes you have made to the source files (e.g. less than a second). One challenge with modular compilation is it drastically increases the build complexity. There are many more commands to enter on the command line, and we need to carefully rack which commands need to be redone whenever we change a C source file. In the next discussion section, we will explore using a build framework to automate the process of modular compilation for complex C programs.
4. The C Preprocessor¶
So far we have glossed over what exactly the #include
directive
actually does. This directive is not part of the C programming language
but is instead part of the C Preprocessor which is yet another step in
the compilation flow. The preprocessor takes an input C source file,
preprocesses it, and generates the preprocessed version of the C source
file. It is important to realize that the C preprocessor is not really
part of the C programming language. The C preprocessor simply manipulates
the plain text in the C source files and knows nothing about the C
programming language's syntax or semantics. The C preprocessor is
powerful but also very easy to abuse. Using the C preprocessor can cause
subtle bugs and is usually not necessary. Unfortunately, there are a few
cases where we have no choice but to use the C preprocessor. gcc
takes
care of automatically running the C preprocessor for us. Here is a more
complete look at the compilation flow for our multi-file C program.
You can see the output of the C preprocessor by using the -E
command
line option to gcc
. Try the following:
1 2 3 | % cd ${HOME}/ece2400/sec02 % gcc -Wall -E -o square.i square.c % cat square.i |
You should see something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # 1 "square.c" # 1 "<built-in>" # 1 "<command-line>" # 31 "<command-line>" # 1 "/usr/include/stdc-predef.h" 1 3 4 # 32 "<command-line>" 2 # 1 "square.c" # 1 "square.h" 1 int square( int x ); # 2 "square.c" 2 int square( int x ) { return x * x; } |
You will see some lines that start with #
which are comments and then
you see the contents of the square.h
file included into the resulting
preprocessed file. All the #include
directive does is simply take the
contents of the given file and include them verbatim.
In our programming assignments, we will use a more sophisticated coding convention for our header files that looks like this:
1 2 3 4 5 6 | #ifndef SEC02_SQUARE_H #define SEC02_SQUARE_H int square( int x ); #endif |
The #ifndef
, #define
, and #endif
preprocessor directives implement
what is called an include guard which prevents the contents of a header
file from being included multiple times. If the contents of a header file
are accidently included multiple times, the compiler will process its
contents twice and this will likely cause an error.
5. Compiling and Running C Programs for PA1¶
Let's experiment with compiling an ad-hoc test for the first programming
assignment using what we have learned in this discussion section. First,
you need to make sure you have accepted the invitation to join the
cornell-ece2400
GitHub organization. Go to this link:
and sign-in to GitHub. If you have not accepted the invitation yet, you
will see a page with a link to Join ECE 2400 Computer Systems
Programming. If you have already accepted the invitation (probably by
clicking a link in an automated email from GitHub), then you will see the
cornell-ece2400
GitHub organization. Confirm you can see a repository
with your NetID.
You can now use the following steps to clone your PA repo.
1 2 3 4 5 | % mkdir -p ${HOME}/ece2400 % cd ${HOME}/ece2400 % git clone git@github.com:cornell-ece2400/netid % cd netid % tree |
where netid
is your NetID. Recall that ad-hoc testing involves
compiling a program manually from command line, and using that program to
print out the result of your function. Then you can verify that the
results are as expected. We have included an ad-hoc test for each
implementation in your repo that you can use for early experimentation.
1 2 3 | % cd ${HOME}/ece2400/netid/pa1-math/src % gcc -Wall -o sqrt-iter-adhoc ece2400-stdlib.c sqrt-iter.c sqrt-iter-adhoc.c % ./sqrt-iter-adhoc |
These ad-hoc tests will not print out the correct value because you haven't completed the programming assignment yet, but this at least illustrates how we can used what we have learned in this discussion section to compile an ad-hoc test from the command line.
6. To-Do On Your Own¶
If you have time, create a new source file named avg3-main.c
in the
${HOME}/ece2400/sec02
directory that contains an avg3
function. This
function should calculate the average of three values instead of just
two. Modify the main
function to properly call your updated function.
Compile your new program and run it to verify it calculates the average
correctly.