Archive

Posts Tagged ‘javascript’

The fine art of human detection (how to fight bots)

July 30th, 2010 10 comments

Through the history of computers, there were always programs trying to mimic human actions/behavior. From ELIZA in the sixties to these days the task raises in different fields of computers science: artificial intelligence, natural language processing, image processing, etc. Particular type of programs are designed to mimic or automate human-computer interactions. They produce inputs for other programs as if they came from a person. In this post I’m going to introduce those programs, how they work, why is it important to detect (and stop) them, how is it done today and some new ideas I have on the subject.

The power of automation can be harnessed, like all powers, for good and evil. On one hand it can save you a lot of time performing tedious repetitive tasks such as news checking, file downloading, event handling, etc. On the other hand it can be used by advertisers, spammers and soliciters to perform mass-mailing, mass-registering, mass-spamming, etc. As you can guess, some organizations would like to use such an automation while others would like to block it.

Sometimes it is unclear which side is which, for example if I want to download a file from service like “rapidshare”, it would present me with a waiting screen. The sole purpose of this screen, as well as other restrictions they impose on non-paying members is to motivate people to pay. Using automation, my computer could do the waiting and download the files while I’m away, so it’s pretty good from a selfish point of view but pretty bad from their business point of view.

The automation programs can generate data at different levels. To better understand this, let’s first examine typical web interaction: talk-backs. When you talk-back or post comment you use your browser, keyboard and probably mouse too to select (focus on) the text area designed for this purpose, type whatever you have to say and click on “submit” or whatever. What happens under the hood is that your browser sends data to the website, or to be more exact, makes HTTP request to the web server running at the other end, sending data including your talk-back. The web server processes this data and responds according to predefined business logic such as responding with “thank you” and placing your talk-back in queue for approval.

The talk-back is sent along cookies and other data as defined in RFC2616. Since the interaction is well-defined per site (the protocols, parameter names and everything else used as part of the interaction) it’s fairly easy to make a tool that automatically sends data as if it was a web browser operated by a person. From web server point of view it would look exactly the same. Once the interaction is analyzed and understood (can be easily done using proxy or tools like Firebug) a dedicated tool can be built to automate it for example using java.net package, or even as a shell script using curl. Please note that these are no hackers utilities. They’re pretty standard “building blocks” for any web related application/script.

So how can we determine whether it’s really a person using a browser or an automated tool messing with us? The most common solution today is using CAPTCHAs. I’m sure you’ve all seen them before. CAPTCHA is a mean of challenge-response in which the web server generates a picture, usually containing twisted text, and expects the text to be sent back as data. For a human, it’s supposedly easy to understand the picture and type the text in the required field, and for a computer program it’s supposedly difficult to analyze the picture and “understand” the text.

The original idea behind CAPTCHA is nice, but it is becoming increasingly ineffective. Firstly, it is very possible to use Computer Vision techniques alone or together with Artificial Intelligent techniques to recognize the text. Secondly, with publicly available CAPTCHA solving libraries such as PWNtcha or online services such as DeCaptcher, it’s really a matter of how much time/money one is willing to spend rather than a technological challenge. There are also “indirect” ways to overcome CAPTCHA such as analyzing audio CAPTCHA (sometimes available for accessibility purposes) or passing the CAPTCHA images to a different (preferably large traffic) web site to be solved by real humans.

As CAPTCHA breakers are closing the gap, it’s time to present them with new challenges. Don’t get me wrong, I’m not picking a side. I’ve been on both sides, and my interest is pure intellectual, so whoever is upper-handed at the moment is irrelevant. I got some ideas for new challenges. Feel free to implement/use/further develop them. However, I take no responsibility, and I can assure you they are breakable, but they should take the game to the next level.

Let’s first examine Achilles’ heel of current methods:

  1. They are deterministic: client-server interaction is always the same and can be easily revealed.
  2. Automation tools are challenged by means of something they need to figure out, usually independently of what those means try to protect.
  3. CAPTCHAs are presented as images in a format suitable for copying and processing.

To raise the bar we have to address those problems. There’s a lot of room for creativity and I’m not talking about using current CAPTCHA approach with new kind of images, as shown here. I’m talking about a new approach. Modern browsers are quite sophisticated. We can use their advanced capabilities as part of the challenge-response, for example using their Javascript and rendering engine. Instead of challenging automation tools with “analyze this image” we can challenge them with “execute this script”. This would enforce them to either implement Javascript engine or use real browser as part of automation. Both demand higher level of complexity.

Then comes the randomness element. The script should be different each time it is served. It can be gained by dynamically generating self-decrypting script. The decryption should take place during script execution, meaning that instead of sending encrypted cipher-text, decryption script and a key (as in traditional systems) there should be one script that decrypts itself little by little using “evals” – decrypting commands block, then execute them. Execution will perform what original commands were intended to do and decrypt the next commands block. This method will prevent attempts to decrypt/analyze the script without executing it, thus, enforcing the usage of Javascript engine, achieving first randomness element.

The script should present a challenge in a manner it’s response is context dependent for example for talk-backs or comments you can ask about article’s content. Beware that it should be smartly implemented. If you ask multiple choice question, automation tool will try all choices. If response is a single word from the article automation tool may try them all. Anyhow, if it’s inapplicable or you feel it’s more harmful then helpful (by narrowing down response space) you can always revert to letters/numbers combination.

The challenge should not consist of image only. It should be partially an image, maybe as background, and dynamically rendered pixels/lines/polygons/curves or placement of smaller image portions using browser’s rendering engine. Together they should all visually form the “question” (or letters/numbers combination) mentioned in the previous paragraph. This step will make it more difficult for automation tool to process/copy/understand the challenge.

Finally, you can add your own spice just to make things more complex, for example the running script can record time differences between key presses on response field and send them along the response. You can use statistical analysis to determine if it came from human (generally, if auto-filled by robot, even if there is random wait between each key press, standard deviation should be relatively low). This is only one example. You can also invent new things, add random business logics, use self modifying code… possibilities are endless.

I hope you liked my ideas. Let me know what you think.

Setting up XMPP BOSH server

June 17th, 2010 18 comments

This tutorial explains how to setup/troubleshoot XMPP server with BOSH. I’m not getting into what is XMPP and what is it good for. The first two paragraphs are theoretical. XMPP is stateful protocol in a client-server model. If web application needs to work with XMPP a few problems arise. Modern browsers don’t support XMPP natively, so all XMPP traffic must be handled by program running inside the browser (JavaScript/Flash etc…). The first problem is that HTTP is a stateless protocol, meaning each HTTP request isn’t related with any other request. However this problem can be addressed by applicative means for example by using cookies/post data.

The second problem is the unidirectional nature of HTTP: only the client sends requests and the server can only respond. The server’s inability to push data makes it unnatural to implement XMPP over HTTP. The problem is eliminated if client program can make direct TCP requests (thus eliminating the need of HTTP). However, if we want to address the problem within HTTP domain (for example because Javascript can only forge HTTP requests) there are two possible solutions, both require “middleware” to bridge between HTTP and XMPP. The solutions are “polling” (repeatedly sending HTTP requests asking “is there new data for me”) and “long polling”, aka BOSH. The idea behind BOSH is exploitation of the fact that the server doesn’t have to respond as soon as he gets request. The response is delayed until the server has data for the client and then it is sent as response. As soon as the client gets it he makes a new request (even if he has nothing to send) and so forth.

BOSH is much more efficient, from server load’s point of view and traffic-wise. In this tutorial I set up Openfire XMPP server (which also provides the BOSH functionality) with JSJaC library as client, using Apache as web server on Ubuntu 10.04. Openfire has Debian package and as such, installation is fairly easy. Just download the package and install. After installation browse to port 9090 on the machine it was installed on, and from there it’s web-driven easy setup. If you choose to use MySQL as Openfire’s DB make sure to create dedicated database before (mysqladmin create).

After initial setup, I wasn’t able to login with the “admin” user. This post solved my problem, openfire.xml is located at /etc/openfire (if you installed from package), you will need root privileges to edit it and then restart Openfire (sudo /etc/init.d/openfire restart). Other than that everything worked fine. Openfire server (as well as all major XMPP servers) provides the BOSH functionality, aka “HTTP Binding” aka “Connection Manager”. By default it listens on port 7070, with “/http-bind/” (the trailing slash is important).

To make sure it works (this is the part I couldn’t find anywhere, that’s why it took me long time to resolve all problems) I used “curl”, very handy tool (sudo apt-get install curl). To test the “BOSH server”:
# curl -d “<body rid=’123456′ xmlns:xmpp=’urn:xmpp:xbosh’ />” http://localhost:7070/http-bind/

Switch “localhost” with your server name, notice the trailing slash. Expected result should look like:

<body xmlns="http://jabber.org/protocol/httpbind" xmlns:stream="http://etherx.jabber.org/streams" authid="2b10da3b" sid="2b10da3b" secure="true" requests="2" inactivity="30" polling="5" wait="60"><stream:features><mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl"><mechanism>DIGEST-MD5</mechanism><mechanism>PLAIN</mechanism><mechanism>ANONYMOUS</mechanism><mechanism>CRAM-MD5</mechanism></mechanisms><compression xmlns="http://jabber.org/features/compress"><method>zlib</method></compression><bind xmlns="urn:ietf:params:xml:ns:xmpp-bind"/><session xmlns="urn:ietf:params:xml:ns:xmpp-session"/></stream:features></body>

Once verified, we can continue with the next step. Since the client is Javascript based, all Javascript restrictions enforced by the browser applies. One of these restrictions is “same origin policy“. It means that Javascript can only send HTTP requests to the domain (and port) it was loaded from and since it is served on HTTP (port 80) it can’t make requests to port 7070. Solution: Javascript client will make requests to the same domain and port. The requests will be forwarded locally to port 7070 by Apache. I guess you can use the same method to forward even to a different server but I didn’t try. I configured forwarding following this post but there is probably more than one way to do it.

Add to /etc/apache2/httpd.conf the following lines (root privileges needed):
LoadModule proxy_module /usr/lib/apache2/modules/mod_proxy.so
LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so

Then, add to /etc/apache2/apache2.conf the following lines (root privileges needed):
ProxyRequests Off
ProxyPass /http-bind http://localhost:7070/http-bind/
ProxyPassReverse /http-bind http://localhost:7070/http-bind/
ProxyPass /http-binds http://localhost:7443/http-bind/
ProxyPassReverse /http-binds http://localhost:7443/http-bind/

Now restart the Apache (sudo /etc/init.d/apache2 restart) and make sure it starts properly. To verify the forwarding works, use the same curl method, this time as request to the Apache:
# curl -d “<body rid=’123456′ xmlns:xmpp=’urn:xmpp:xbosh’ />” http://localhost/http-bind

The result should be the same as before. If it doesn’t work there is problem with the forwarding. Update: if you get “403 Forbidden” error from Apache, this may help (thanks Tristan). Once it is working, server side is ready. On the client (in my case JSJaC), you should specify to use BOSH/HTTP Bind “backend” (as opposed to “polling”). For “http bind” url just use “/http-bind” and everything should work. Notice that if you open the client locally on your desktop (not served by the Apache) it won’t work because of the “same origin policy” mentioned before.

I hope you find this tutorial useful, it sure could have helped me… :)