-
Notifications
You must be signed in to change notification settings - Fork 73
Spring20Cs361sLab1
| Assigned | 1/22/2020 |
| Due | 2/5/2020 |
| Points | 100 |
In 1984, Ken Thompson wrote a paper called, "Reflections on Trusting Trust." You should have already read this paper as part of your class readings. If you haven't, read it right now.
Thompson's point was that, in the computing world, our supply chains go very, very deep and malware could be hidden within those supply chains. Even if we review our source code, there's nothing to prevent a compiler from corrupting the code when it converts it to binary.
"No problem," you might be thinking, "I'll just review the compiler's source code."
But the problem is that compiler's are compiled. What about the compiler building your compiler?
In this lab, named after the year of Thompson's paper, you will implement the attack Thompson described. That is, you will modify a compiler to corrupt a piece of software during compilation. Then, you will corrupt the compiler to corrupt compilation of the compiler.
Your final corrupted compiler will take uncorrupted compiler source code as input and produce a corrupted binary. The corrupted binary should be able to corrupt uncorrupted compilers and a target piece of source code that will be provided to you.
Just to be clear:
- S_C is the uncorrupted compiler source code
- S_C' is the corrupted compiler source code
- C is the uncorrupted compiler binary
- C' is the corrupted compiler binary
- S_A is the uncorrupted application source code
- A is the uncorrupted application
- A' is the corrupted application
Note that there is no corrupted application source code.
You need to write corrupted compiler source code S_C' such that the corrupted compiler it produces performs the following functions:
- S_C -> C' -> C'; that is, the corrupted compiler will take uncorrupted compiler source and produce a corrupted compiler
- S_A -> C' -> A'; that is, the corrupted compiler will take uncorrupted application source and produce a corrupted application
The most important observation is this: once the corrupted compiler is installed on the victim's machine, there is no corrupted source code. The malware compiler will self replicate, infecting uncorrupted compiler source code.
For this lab, you will need the following tools and a general knowledge of how to use them:
gitgcctelnet- a posix-compatible build environment (the lab was tested in an Ubuntu WSL envirnoment)
For this part of the assignment, you need to test out
building the compiler and building the application code.
We could mess with the gcc compiler, but have you ever
tried to build gcc? It takes 10 minutes just to unpack
it.
To make your life more manageable, we will use a c compiler
known as the "Tiny C Compiler," or tcc. tcc is a great
little compiler, but it does have a few issues here and
there. For our lab, you can't download one of the
release versions; you'll have to get it from github
using the following command:
git clone https://github.com/TinyCC/tinycc
Once cloned, you need to switch to the mob branch, which
has the corrections to the code that we need.
git checkout mob
You should keep a clean (uncorrupted) copy of this code
as well as have a copy for your development. You can,
of course, just check out a clean copy from git any
time you need to, but it might be easier to just
copy your repo. Or, you could fork the tcc repo
in git. So long as you can readily have clean code
in addition to your modified code, you will be fine.
When tcc is installed, it is usually installed into
system folders. I highly recommend that you create a
directory under your home folder for the installation.
mkdir /home/my_username/tcc_root
Or something similar. When you configure the tcc build,
you can tell the build system to install to this location.
This is done from within the tcc code directory:
./configure --prefix=/home/my_username/tcc_root
After configuration is complete:
make
make install
You can now compile c programs using ~/tcc_root/bin/tcc.
In fact, you can compile tcc with tcc! Try it out.
make clean
./configure --prefix=/home/my_username/tcc_root --cc=/home/my_username/tcc_root/bin/tcc
make
make install
The other program that you need to compile is our target
application called tinypot. tinypot is a very small
"honeypot". We will talk some about "honeypots" during the
year, but basically a "honeypot" is any program designed
to lure in a hacker. The honeypot can be used to slow
them down/waste their time, but it can also collect
information about them.
I was not the original author of tinypot. But I have
modified the code to include a simple login with a hard-coded
set of authorized users. You can get the code you need
from the class github:
git clone https://github.com/CrimsonVista/UTAustin-Courses
cd UTAustin-Courses/2020sp_cs316s/labs/lab1/tinypot
There are only three files in this path: tinypot_main.c,
tinypot_process.c, and tinypot_process.h. There is no
makefile; to build the application:
/home/my_username/tcc_root/bin/tcc tinypot_main.c tinypot_process.c -pthread -o tinypot
This will create the application tinypot within the folder. To
run tinypot listening on port 9999 type the following command:
tinypot - 9999
This starts up the application. You can connect to it with telnet (from another shell)
telnet localhost 9999
After a minute, you should see a login prompt. There are two
legal usernames: user1/password1 and user2/password2.
If you enter legitimate data, you will see a prompt ($)
that will simply echo back anything you type. If the auth data
is not legitimate, the connection ends.
Now that you can build your compiler and the tinypot application
the next step is to modify the compiler to insert a backdoor
when compiling tinypot.
First, let's look at how we want to modify the tinypot code.
You can find the hard-coded user authentication data list
in the tinypot_process.c file at lines 123-126.
AuthData_t authorizedUsers[] = {
{"user1", "password1"},
{"user2", "password2"}
};
So, your job is to modify the compiler such that, during compilation,
this file is modified to include a username backdoor with the
password backpass.
NOTE: For grading, this data structure is not guaranteed to be in the same place in the file each time. You will need to search for the data structure in order to insert your new line.
There are a number of ways to do this. You're free to choose
any method you'd like so long as it doesn't modify the original
code file. That is, at the end of compilation tinypot_process.c
should not be altered.
However, we will allow you to dynamically create an alternate file that contains the corrupted data. In an ideal world, the compiler wouldn't do this because it's evidence left on the machine for the victim to find. Ideally, the corrupted compiler would modify the data in memory leaving no traces.
But modifying stuff in memory requires understanding how buffers are allocated and potentially re-allocating buffers. For simplicity, we will not require you to do this.
The easiest way to do this is to look at the libtcc.c file.
Around line 600 or so is the function tcc_open. This function
populates a data structure with the filename, some buffers, and
a file descriptor (fd). One approach you could choose to do is:
- Check if the filename parameter to
tcc_openistinypot_process.c - If it is, create a new file (perhaps with a leading
.to make it "hidden") - Read one line at a time from
tinypot_process.c - After reading a line that contains the
authorizedUsersline, insert a new line with the backdoor data - Point the
fdto the modified file instead of the original file
That's it! You should now be able to compile tinypot and find that
you can login with the backdoor data!
Now, for the final steps of this lab, you need to modify your evil compiler to corrupt un-corrupted compiler source code.
The operation is not all that different from corrupting
the application. The only file that needs to be corrupted
is libtcc.c (unless you did something different from the
approach I suggested). As before, you should ensure that
when that file is compiled the code is modified.
This step is a little bit harder because it's not a one-line modification like it was for the application. Worse, because of a bit of circular logic, you can't just write the modifications directly.
Why?
Because you have to modify the compiler to modify the compiler. So, if you want to have a modified statement, you have to write the modification that inserts the modified statement. But then, you also have to write the modification that inserts the modification that inserts the modified statement.
You may have to do a little thinking to see how to write the self-modifying code. You're welcome to search Google for ideas or ask the TA/professor.
But, once you have it working you should be able to test by
compiling tcc with the corrupted tcc. The corrupted tcc
should be able to corrupt any other compilations of tcc.
And, of course, the corrupted tcc should corrupt tinypot.
You must complete this assignment individually but I encourage collaboration. I will adjust the collaboration policy throughout the semester based on the assignment and how well the class is doing. For this assignment, you must write all of your code by yourself, and you may not look at anyone else's code. If you need debugging help, please talk to the professor or the TA.
But you may work together by discussing concepts, strategies, and even generic algorithms and pseudo-code. So long as you are not writing code together, or sharing code, you may work in any size group.
Each submission is individual via github. To submit:
- Use git to create a patch for
tinyccusinggit diff > tinycc.patchfrom thetinyccrepo - Commit the patch in your class repo under
labs/lab1/tinycc.patch - Tag your commit using
git tag lab1-1.0. If you make a change after submission, re-commit and tag withgit tag lab1-1.xwherexis one greater than the last submission
Your submission will be graded as follows:
- 50 points. Modified compiler correctly corrupts the
tinypotapplication - 50 points. Modified compiler correctly corrupts
tcc - 20 points (extra credit). Modifies compiled files in memory without creating corrupted files on disk (all corruption in memory)
If you want the extra credit mentioned above,
include a file labs/lab1/inmemory.txt that describes
how you perform the in-memory modifications.