Fuzzing Windows message queues — WTF?

Posted in Vulnerabilities and exploits on July 17, 2010 by decepticonpunk

Yeap, you can’t earn a living by coding fuzzers, analysis-framewoks-to-be-in-100-years, weird YACC stuff etc. Since my monthly income is quite low, I decided to undertake a free lancing job for a Greek organization which I wouldn’t like to name. Among other things, the job involved reversing an application and creating a keygen as well as investigating the several points of I/O. Everything went pretty smoothly until I noticed that the application in question defined several WM_APP messages for internal use.

My first step was to launch Visual Studio’s Spy++ and start looking at the events exchanged by the application components. It turned out that most of the entries in Spy++’s list were not that interesting. Nevertheless, the following events cought my attention.

<00001> 000100D0 P message:0x8020 [User-defined:WM_APP+32] wParam:00000004 lParam:02574FD0
...
<00004> 000100D0 P message:0x8002 [User-defined:WM_APP+2] wParam:00000004 lParam:02574FD0
<00005> 000100D0 P message:0x8023 [User-defined:WM_APP+35] wParam:00000008 lParam:025C8C40
<00006> 000100D0 P message:0x8002 [User-defined:WM_APP+2] wParam:00000008 lParam:025C8C40
<00007> 000100D0 P message:0x8020 [User-defined:WM_APP+32] wParam:00000004 lParam:02539FD0
...
<00010> 000100D0 P message:0x8002 [User-defined:WM_APP+2] wParam:00000004 lParam:02539FD0

Due to the nature of the target application, detecting the handlers for those custom events was quite difficult, so, I decided to have some fun before firing up IDA pro. I devoted 10 minutes of my life to write a tiny C code that would send those custom events to all of the application’s threads. For wParam and lParam I used random values. It turned out that it wasn’t such a dumb idea after all. The target crashed, and then it crashed again, and again, and again…

The root cause of all those access violations was the fact the target application assumed that the wParam and lParam values were valid memory addresses! For example, a call to SendMessage() like the one below:

lResult = SendMessage(pProcessHwnd->hWnd, 0x8002, 0x00400000, 0x00400000);
fprintf(stderr, "\tLRESULT = 0x%.8p\n", lResult);

Resulted in the following output in WinDBG’s command window.

00497744 8b7e28          mov     edi,dword ptr [esi+28h] ds:0023:00400028=00000000
0:003> r? esi
esi=00400000

Since the target application received network input my next step was to hook all the calls to recv() in order to find any static buffers for placing my data. For this purpose, I created the following one-liner socket sniffer for WinDBG :-P

bp WS2_32!recv "r $t1 = poi(@esp + 8); pt \"dd @$t1; g\""

I fired up the target, I monitored the network traffic and used netcat to send some alphas at one of the network ports the application was receiving data. This little test revealed 2-3 candidate buffers that were allocated at a fixed point. Notice that, so far, no reversing took place. All of our assumptions are based on pure observation (which is a bad thing if you’re trying to code a serious exploit).

Continueing with the vulnerable code, after a bunch of irrelevant stuff, I ended up in the following instruction where “eax” contains the return value of CreateWindow()!

mov     [esi+4], eax

It turns out that we can write “eax” wherever we want! I haven’t figured out if it can be used to execute arbitrary code but I’m pretty sure the bytes pointed by the window handle will contain something useful ;-)

So that’s it for today. Before I end this post, I would like to share with you a few links that got my attention this month…

  • windbg.info – A community for WinDBG users (check out the “WinDbg. From A to Z!” PDF, it rocks!).
  • REcon 2010 is over. Waiting for the material to go public! Sean’s slides are already available at his blog.
  • Everything you need to know about SSA.
  • Indeed, it looks familiar.

Cya

– dp

AthCon 2010

Posted in Vulnerabilities and exploits on June 5, 2010 by decepticonpunk

Yop,

I just got back home from Athens! I was a speaker at AthCon 2010 where I gave a presentation on BNF based fuzzing and met several cool people from all over the world. Hopefully, according to the organizers, the slides, whitepapers and videos will soon be available for download at athcon.org. Until then, you can have a look at bnffuzz (the PoC BNF based fuzzer that I presented) at grhack.net.

AthCon was definitely a great success and an unprecedented event for Greece. Unfortunately I live in a country where the vast majority of people have no vision, no taste, no brains, not anything. Good news is that there are a few who have a strong will to change this rotten situation. I’m talking about the AthCon 2010 staff and especially Kyprianos Vasilopoulos, Christian Papathanasiou and Anna Manousaki who managed to organize the best conference that has ever taken place in Greece. I would like to express my gratitude and respect to these people and to everyone that helped in making AthCon such a great success.

See you next year!

– dp

The C typedef problem

Posted in Compilers on March 5, 2010 by decepticonpunk

EDIT: The ideas described in this post can only be used for a limited set of typedef declarations. Stay tuned for a better technique :-)

Introduction
If you have ever written any program doing analyses on C code, then this isn’t news to you. The famous typedef problem has been the subject of many discussions among compiler developers and many solutions have been proposed. Personally speaking, I wasn’t aware of how serious it was until I had to cope with it. So, you guessed it, I’m currently trying to solve the famous typedef problem that the parser generator of OpenSAT faces. Before moving on to describing any possible solutions, let’s try to define the actual source of evil.

Definition
I am sure you already have the K&R book (I mean, come on, everyone has it), so, jump to page 234 (A13) and have a look at the C grammar. One of the most notable rules is the following:

<type-specifier> ::= "void" | "char" | "short" | "int" | "long" | "float" | "double" | "signed" | "unsigned" | <struct-or-union-specifier> | <enum-specifier> | <typedef-name>
<typedef-name> ::= <identifier>

Looking at the rules above, one can see that a <type-specifier> can be equal to a <typedef-name> and a <typedef-name> equal to an <identifier>. Even if you are not familiar with BNF grammars, it’s pretty easy to understand that this set of rules refer to identifiers declared as types via typedef.

typedef unsigned long our_int_t;

When a C compiler comes across such a declaration, it knows that our_int_t should be treated as an alias for unsigned long and that our_int_t can be used as a type specifier (i.e as a standard C keyword used to define a basic C type). Notice that type names defined via typedef should always obey the rules for variable names, so, without some extra context information, the C lexer is unable to understand if a given string is an identifier or a type name! You might think that this is not a problem at all, but let’s have a look at a completely valid C snippet that any descent compiler will happily accept as correct.

typedef unsigned long our_int_t;

int main(void) {
  our_int_t v; /* [1] */
  int our_int_t = 2; /* [2] */
  return 0;
}

Funny right? At [1], a new variable named v of type our_int_t is declared. The C compiler has already parsed the typedef declaration, so, it knows that our_int_t is actually an alias for unsigned long and thus accepts the declaration. So far so good. The real problem starts at [2] where our fictional programmer, interestingly enough, declares a new integer variable named our_int_t which is given a default value of 2. Although our_int_t was previously declared as a type name, it is used at [2] as an identifier. On the contrary to what you might think, this line is syntactically and semantically correct. If you need further convincing, just compile the previous snippet with your favorite compiler.

The previous paragraph describes only one side of the coin. Unfortunately, one more problem arises from this little inelegance in C’s grammar, but it’s not possible to explain it here in detail. The problem lies in the way LALR states are generated. By replacing <typedef-name> with <identifier> (since <typedef-name> ::= <identifier> is also true), a lot of conflicts pop up in the resulting parsing tables. If you want to have a look, grab the yacc C grammar from here, replace TYPE_NAME with IDENTIFIER and run yacc on it.

Solutions
The past few days I’ve been trying to solve this problem in an ellegant and effective way. During all that time I googled a lot and I came across some very interesting sources which are worth studying. Here are some links:

1. The typedef problem discussed in comp.compilers here, here, here and here.
2. Same at a very cool blog called The little calculist.

There’s more than one way to solve the typedef problem. It can either be done in the parser or in the lexer. In the former case, the C grammar is modified, while in the latter, lookahead is introduced in the lexer. It’s up to the developer to decide what’s best for him. Personally, I consider messing with the grammar a dangerous practice, so I decided to implement the second solution.

Solving the problem in the lexer was not a big trouble. First <typedef-name> and <identifier> are declared as terminals e.g TYPE_NAME and IDENTIFIER as in the yacc grammar shown above. The next step is to modify the lexer in order to make it able to distinguish if a given string is an identifier or a type name and return the appropriate token type. The following pseudocode is what I actually implemented in C for OpenSAT.

/* Upon reading a "typedef" token, set a flag that
 * indicates we are currently lexing a typedef
 * declaration.
 */
if token.name == "typedef" then
  in_typedef = 1;
fi
...
...
if token.name == ";" then
  ...
  in_typedef = 0;  /* Typedef ends at ";". */
fi
...
...
if token.type == IDENTIFIER then
  /* If this token is already in the type_table hash table
   * and if lookahead is not one of the characters shown
   * below then this token is used as a type specifier.
   */
  if token in type_table then
    if lookahead not one of ['=' ',' '{' ';'] then
      token.type = TYPE_NAME
    fi
  /* If we are currently lexing a typedef declaration and
   * the token is not in type_table then this is a new type
   * name. Insert it in the type_table.
   */
  else if in_typedef == 1 then
    token.type = TYPE_NAME
    insert_in_type_table(token)
  fi
fi

So far this solution seems to work fine. You can have a look at my test program here and the output produced by OpenSAT’s lexer here (notice that the tokens are correctly identified). I still haven’t finished dealing with the typedef problem, so, maybe my solution is not 100% correct. If you think you got a better idea drop me a mail or add a comment! :-)

– dp

In Berlin everything’s about *wurst\x00

Posted in Compilers on January 1, 2010 by decepticonpunk

EDIT: Nope, the title is not related to Kaminsky’s 26C3 talk. If it was so, the title would have been something like *\x00wurst ;-)

So, I just got back from Berlin where I attended the 26C3 which was a great success. All of us were there: argp, huku, solidsnk, brat, xorl, ithilgore, sin and gorlist. We drank beers and we had some fun and interesting talks together. Personally speaking, I enjoyed FX’s talk and the Phenoelit party the most :-)

It’s now time to get back to business and studying. During my time off, I managed to finish the LR state generation code for libast (Abstract Syntax Tree library). Libast, which is still under development, is responsible for parsing the input stream according to a BNF grammar given in libbnf’s format. So far, about 80% of libast is complete and works like charm. For debugging purposes, libast comes with Graphviz visualization support. For example, here’s how libast visualizes the traditional Dragon book expression grammar. The output produced by OpenSAT is here (executed via valgrind). The idea of visualizing the states belongs to guerrilla (yeah this guy rocks, he uses formal parsing methods to parse ASM code!).

So, back to hardcore studying since I am taking exams in about one month. In the meantime, during my free time, I’ll be finishing my byacc source code analysis and I’ll be checking gcc’s libcpp (I am planning to modify it and insert it in the OpenSAT source tree). Another idea that I came up with, is the use of Python-based semantics in Syntax Directed Translation schemes e.g. consider a standard byacc input file with the difference that the grammar semantics within { and } are written in Python. So, upon reducing the input according to a grammar rule, the semantic action is passed to the Python interpreter for execution. If you have some experience in that please contact me, I’d really like to know if this works fine in real life applications!

– dp

10 things you should be careful about when auditing sources

Posted in Vulnerabilities and exploits on October 16, 2009 by decepticonpunk

While being busy studying the byacc source code (expect more on this soon), I came across an old list that I once assembled. It was a list of 10 very common C programming pitfalls that, when exploited, may lead to arbitrary code execution. I decided to publish it here in order aid my blog readers identify bugs in C code more easily. So here’s my list. It’s by no means complete, feel free to contact me if you want to contribute. So, after checking for trivial signedness bugs, null pointer dereferences, simple stack & heap overflows etc make sure that you also check the following list. Notice that most of the cases presented are real examples found in commonly used open source software.

Case 1: Making use of snprintf()’s return value
Many programmers use the value returned by snprintf() in order to calculate the next free position in a buffer. Consider the following example. Our imaginary programmer tried to copy two user controlled buffers inside another. He also used snprintf() for security purposes since strcat()/strcpy() are considered dangerous.

int pos;
/* ... */
pos = snprintf(buffer, size, "%s", user_controlled_buffer1);
size -= pos;
pos = snprintf(buffer + pos, size, "%s", user_controlled_buffer2);

This is a very common mistake. According to snprintf()’s manual page, the returned value may exceed the size of the target buffer indicating that more space is needed for the user data to fit in the destination. If the first call to snprintf() returns a position greater than sizeof(buffer), then the second snprintf() will attemp to write data outside the target buffer’s boundaries. Additionally, size -= pos may result in a negative result which in turn may lead to other problems.

Case 2: Buffer increase on demand
This is actually similar to case 1 but it doesn’t lead to directly exploitable conditions. Consider a program that describes I/O buffers using structures. For example, one such structure may contain the actual data as well as an integer indicating the data length.

typedef struct {
  char *data;
  int len;
} io_t;

I’ve encountered a bunch of applications doing stuff like this:

io_t *whatever;
/* ... */
whatever->len += snprintf(whatever->data, size, "%s", user_controlled_buffer);

The variable whatever->len may eventually receive a value greater than the real size of the data region. Although this is not directly exploitable, it usually leads to exploitable conditions.

Case 3: Using strncpy() safely
Ok this is probably the most common mistake. Calling strncpy() like this…

strncpy(buffer, user_controlled_buffer, sizeof(buffer));

…is kinda nasty since it may result in off-by-one errors. On the contrary…

strncpy(buffer, user_controlled_buffer, sizeof(buffer) - 1);

…is much safer. No need to discuss this further since many public exploits target such a vulnerebility and there are plenty of resources on this matter. I still wonder why I included strncpy() in this list!

Case 4: realloc() frees the original chunk on success but not on failure
Here’s another very common C snippet.

new = realloc(old, new_size);
if(!new)
  return;

This is definitively a memory leak, since the call to realloc() won’t free the old chunk in case it fails. This problem alone is not sufficient to cause an exploitable condition, yet, it is a bless for all those people who do code heap exploits. By properly forcing the target program to leak memory, an attacker may setup the heap the way they like.

Case 5: open() race conditions
Most programs need to perform I/O on a file or device. For security purposes, they usually perform various checks first e.g if the file belongs to root, if it is world writable and so on. If the sanity tests are successfuly passed, they continue by actually opening the target file using open() or some similar function. This way of opening files allows for the target file or device to be modified within a time window (i.e after the checks have taken place but before it is actually opened). The AUCERT security checklist, which used to be here, was a neat source of information on how to avoid race conditions. Unfortunately, the link is now dead.

AUCERT proposed that one should check the inode of the target file before and after it is opened. If the inodes do not match then this is probably an indication of a symlink attack or a race condition. I promise I’ll post secure_open.c here when I have some free time to actually implement it :-)

Case 6: Pattern matching is more or less dangerous
Some months ago, a friend of mine was trying to bypass the registry checks performed by a very famous and widely used antivirus suite for Microsoft Windows. His code called RegOpenKey() on a registry location which was considered a security threat by the AV rules. The first thing we actually tested was to slightly obfuscate the registry path by adding extra slashes in the path.

\\\\\\\\\\path\\\\\\\\to\\\\\\\\registry\\\\\\\\key

And guess what? It actually worked! Never underestimate a stupid idea!

Case 7: Using unions safely
I first noticed this one in Dovecot’s secure coding guide. If you haven’t read it yet, then you should do it now. This little text file states that mixing integer and pointer members in unions may result in serious problems that can be easily exploited to achieve arbitrary code execution as well as other fancy stuff :-)

Case 8: Authentication via environmental variables
I was quite surprised to see that the external authentication mechanism used by pureftpd makes use of environmental variables. More precisely, after receiving the credentials from the user, pureftpd exports the given username and password in a pair of environment variables. Then, pureftpd calls the authentication backend which, in turn, decides if authentication is successful. OpenBSD (and possibly others?) implements the kvm_getproc2() and kvm_getenvv2() functions, which allow a non-root user to read the environment of another (possibly privileged) process. There exists a time window (starting before the execve() of the authentication backend and ending after the backend calls unsetenv()) during which a non-root user can sniff the usernames and passwords sent to the ftp server. The following code demonstrates this technique. There’s at least one more widely used open source server that uses this kind of authentication… be careful!

Case 9: Be careful when using free() in for() loops
Invalid usage of free() in for() loops may result in double frees or in invalid memory being accessed. Here’s a very cool example which can be found in K&R page 167.

for(p = head; p != NULL; p = p->next) /* This is wrong! */
  free(p);

Notice that since the pointer ‘p’ is freed via a call to free(), it is not legal to use p = p->next in the for() loop because ‘p’ is not guaranteed to point to a valid memory. The correct way of freeing a list of items is the following:

for (p = head; p != NULL; p = q) {
  q = p->next;
  free(p);
}

Case 10: Null termination tricks
Last but not least, I’ve come across several applications doing the following in order to NULL terminate a buffer.

strncpy(buffer, user_controlled_buffer, sizeof(buffer) - 1);
buffer[strlen(buffer) - 1] = 0;

This is a very dangerous practice. Notice that an empty user_controlled_buffer[] can result in a null byte landing on buffer[-1]. This, in turn, may result in unexpected behavior and possibly exploitable conditions. Generally speaking, any code of the form…

buffer[len - 1] = 0;

…is very dangerous when ‘len’ is tainted ;-)

– dp

Announcing libbnf v1.0!

Posted in Compilers on September 22, 2009 by decepticonpunk

EDIT: The download links were modified to point to the GR Hack CVS server. Read this for more info on how to access the repository.

I finally found some time to finish and release libbnf. Libbnf is a tiny C library that can parse a Backus-Naur Form grammar from a text file and create a graph-like datastructure out of it. Libbnf can also visualize the parsed grammar via Graphviz – it actually exports the BNF grammar in Graphviz’s .dot format which your can later use in order to create a jpeg, png or even a vector image.

For example, here’s how libbnf visualizes this C grammar (K&R with minor modifications).

Libbnf requires libdatastruct which you can download from here. Unfortunately, libdatastruct still lacks proper documentation.

You can download libbnf from this location. Make sure you read the REAMDE, test.c, test.bnf and C.bnf files before using libbnf in your programs. If you encounter any problems, bugs etc let me know by mailing me. Please do not post compilation errors in the comment section!

Waiting for your feedback!
– dp

Python in noexec-land

Posted in Vulnerabilities and exploits on September 17, 2009 by decepticonpunk

About two or three days ago, me and slasher had our special version of wargames. We quickly noticed that although the system we came across used a vulnerable kernel, it actually had all the writable partitions mounted as noexec. So what could we do?

It is of common belief that interpreted languages like perl, ruby or python, are pretty useful when you are under strict noexec permissions. The source scripts can be executed without requiring +x on the target .pl, .rb or .py file. So that was the next thing we looked for and we immediately found out that the target box had perl and python installed.

Perl and python are both very nice programming languages, but personally I prefer python. Well, actually it is not only a matter of personal taste. Python supports a feature called FFI (Foreign Function Interface) which allows any python coder to directly call any C function from any shared object. FFI is not a python specific feature, in fact, the term FFI usually refers to interpreter infrastructure. For more info you can have a look here and here. Python’s FFI includes support for structures, unions and… pointers!

Great news! Since python permits pointer usage via the ctypes FFI library, it is also implied that we can allocate, deallocate, dereference and even find the address of a buffer within the context of an interpreted language (does perl have a feature like that?). By combining what was said so far, we can build or port any exploit in python. We chose to port the public exploit for the proto_ops[] NULL pointer dereference, a bug discovered by Julien Tinnes and Travis Ormandy of Google security, and we did have a great success.

sh-3.2$ uname -a
Linux xxx 2.6.27.8-xxx #4 SMP PREEMPT Sun Aug 9 20:31:40 EEST 2009 i686 Intel(R) Core(TM)2 Duo CPU T5450 @ 1.66GHz GenuineIntel GNU/Linux
sh-3.2$ python proto_ops_exp.py
Linux <= 2.6.30.4 proto_ops[] NULL pointer dereference exploit
Using Python's FFI to bypass noexec!

# Current uid=101 and current gid=102
# Reported page size is 4096 bytes
# Copying uid and gid in the heap
# Copied 4 bytes at 0x082c3940
# Copied 4 bytes at 0x082c3930
# Copying "/bin/sh" string in the heap
# Copied 7 bytes at 0x082c4f50
# Copying exit_code() in the heap
# Copied 15 bytes at 0x08276678
# Copying exit_stack[] in the heap
# Copied 4096 bytes at 0x082dfb80
# Loading kernel_code() in the null page
# Copied 406 bytes of shellcode at 0x00000000
# If you don't get root you are an idiot
bash-3.2# id
uid=0(root) gid=0(root) groups=102(xxx)
bash-3.2# echo burp && exit
burp
exit

For very obvious reasons, I won’t publish the python code of the above exploit. It is fairly easy for any programmer to code one of his own (python manuals are self explanatory and the vulnerability is public since around August 13th). So, have a nice time porting your codes!

– dp

printf(“Hello world!\n”);

Posted in Uncategorized on August 26, 2009 by decepticonpunk

Hello everyone, this is my first post. Hopefully, soon enough I’ll start posting more interesting stuff than those ugly introductional texts. For now, you can only have a look at what this blog is about.

See you soon!
– dp