The Facebook unliker
I was looking for an easy way to remove my Facebook “likes”. I couldn’t find any, so I made a little app to the rescue :)
Check out: http://frishit.com/unliker
I was looking for an easy way to remove my Facebook “likes”. I couldn’t find any, so I made a little app to the rescue :)
Check out: http://frishit.com/unliker
Sorry for not publishing anything for a long time. I’ve been busy finishing my bachelors and finding a job. Anyhow, I found out that my A-Z game has won July’s Dreamhost site of the month contest, which is really cool. It also means more traffic and more people cheating the system (FUN!), so I applied some of the protections I wrote about here…
I’m currently out of ideas of cool things to write about, so if you got some ideas or you want to me write about anything in particular this is the time :)
Through the history of computers, there were always programs trying to mimic human actions/behavior. From ELIZA in the sixties to these days the task raises in different fields of computers science: artificial intelligence, natural language processing, image processing, etc. Particular type of programs are designed to mimic or automate human-computer interactions. They produce inputs for other programs as if they came from a person. In this post I’m going to introduce those programs, how they work, why is it important to detect (and stop) them, how is it done today and some new ideas I have on the subject.
The power of automation can be harnessed, like all powers, for good and evil. On one hand it can save you a lot of time performing tedious repetitive tasks such as news checking, file downloading, event handling, etc. On the other hand it can be used by advertisers, spammers and soliciters to perform mass-mailing, mass-registering, mass-spamming, etc. As you can guess, some organizations would like to use such an automation while others would like to block it.
Sometimes it is unclear which side is which, for example if I want to download a file from service like “rapidshare”, it would present me with a waiting screen. The sole purpose of this screen, as well as other restrictions they impose on non-paying members is to motivate people to pay. Using automation, my computer could do the waiting and download the files while I’m away, so it’s pretty good from a selfish point of view but pretty bad from their business point of view.
The automation programs can generate data at different levels. To better understand this, let’s first examine typical web interaction: talk-backs. When you talk-back or post comment you use your browser, keyboard and probably mouse too to select (focus on) the text area designed for this purpose, type whatever you have to say and click on “submit” or whatever. What happens under the hood is that your browser sends data to the website, or to be more exact, makes HTTP request to the web server running at the other end, sending data including your talk-back. The web server processes this data and responds according to predefined business logic such as responding with “thank you” and placing your talk-back in queue for approval.
The talk-back is sent along cookies and other data as defined in RFC2616. Since the interaction is well-defined per site (the protocols, parameter names and everything else used as part of the interaction) it’s fairly easy to make a tool that automatically sends data as if it was a web browser operated by a person. From web server point of view it would look exactly the same. Once the interaction is analyzed and understood (can be easily done using proxy or tools like Firebug) a dedicated tool can be built to automate it for example using java.net package, or even as a shell script using curl. Please note that these are no hackers utilities. They’re pretty standard “building blocks” for any web related application/script.
So how can we determine whether it’s really a person using a browser or an automated tool messing with us? The most common solution today is using CAPTCHAs. I’m sure you’ve all seen them before. CAPTCHA is a mean of challenge-response in which the web server generates a picture, usually containing twisted text, and expects the text to be sent back as data. For a human, it’s supposedly easy to understand the picture and type the text in the required field, and for a computer program it’s supposedly difficult to analyze the picture and “understand” the text.
The original idea behind CAPTCHA is nice, but it is becoming increasingly ineffective. Firstly, it is very possible to use Computer Vision techniques alone or together with Artificial Intelligent techniques to recognize the text. Secondly, with publicly available CAPTCHA solving libraries such as PWNtcha or online services such as DeCaptcher, it’s really a matter of how much time/money one is willing to spend rather than a technological challenge. There are also “indirect” ways to overcome CAPTCHA such as analyzing audio CAPTCHA (sometimes available for accessibility purposes) or passing the CAPTCHA images to a different (preferably large traffic) web site to be solved by real humans.
As CAPTCHA breakers are closing the gap, it’s time to present them with new challenges. Don’t get me wrong, I’m not picking a side. I’ve been on both sides, and my interest is pure intellectual, so whoever is upper-handed at the moment is irrelevant. I got some ideas for new challenges. Feel free to implement/use/further develop them. However, I take no responsibility, and I can assure you they are breakable, but they should take the game to the next level.
Let’s first examine Achilles’ heel of current methods:
To raise the bar we have to address those problems. There’s a lot of room for creativity and I’m not talking about using current CAPTCHA approach with new kind of images, as shown here. I’m talking about a new approach. Modern browsers are quite sophisticated. We can use their advanced capabilities as part of the challenge-response, for example using their Javascript and rendering engine. Instead of challenging automation tools with “analyze this image” we can challenge them with “execute this script”. This would enforce them to either implement Javascript engine or use real browser as part of automation. Both demand higher level of complexity.
Then comes the randomness element. The script should be different each time it is served. It can be gained by dynamically generating self-decrypting script. The decryption should take place during script execution, meaning that instead of sending encrypted cipher-text, decryption script and a key (as in traditional systems) there should be one script that decrypts itself little by little using “evals” – decrypting commands block, then execute them. Execution will perform what original commands were intended to do and decrypt the next commands block. This method will prevent attempts to decrypt/analyze the script without executing it, thus, enforcing the usage of Javascript engine, achieving first randomness element.
The script should present a challenge in a manner it’s response is context dependent for example for talk-backs or comments you can ask about article’s content. Beware that it should be smartly implemented. If you ask multiple choice question, automation tool will try all choices. If response is a single word from the article automation tool may try them all. Anyhow, if it’s inapplicable or you feel it’s more harmful then helpful (by narrowing down response space) you can always revert to letters/numbers combination.
The challenge should not consist of image only. It should be partially an image, maybe as background, and dynamically rendered pixels/lines/polygons/curves or placement of smaller image portions using browser’s rendering engine. Together they should all visually form the “question” (or letters/numbers combination) mentioned in the previous paragraph. This step will make it more difficult for automation tool to process/copy/understand the challenge.
Finally, you can add your own spice just to make things more complex, for example the running script can record time differences between key presses on response field and send them along the response. You can use statistical analysis to determine if it came from human (generally, if auto-filled by robot, even if there is random wait between each key press, standard deviation should be relatively low). This is only one example. You can also invent new things, add random business logics, use self modifying code… possibilities are endless.
I hope you liked my ideas. Let me know what you think.
It’s been a while since my last post, I’ve been busy lately. You may have noticed that I made a nice WEB2.0 game “How fast can you type the alphabet”. The game is simply about typing a-z as fast as you can and it keeps track of your best score. Currently three alphabets are supported: English, Russian, Hebrew. To make it more competitive, I made “all times best 5” scoreboard.
For the scoreboard I needed to query the database (Mysql) for the top 5 scores, per language. I didn’t think it’s gonna be an issue, definitely not big enough to dedicate a post for it but as I soon found out, I was very wrong. Here is my scores table structure:
+--------+---------------------+------+-----+-------------------+-------+ | Field | Type | Null | Key | Default | Extra | +--------+---------------------+------+-----+-------------------+-------+ | id | bigint(20) unsigned | NO | PRI | NULL | | | input | tinyint(3) unsigned | NO | PRI | NULL | | | score | float unsigned | NO | | NULL | | | date | timestamp | NO | | CURRENT_TIMESTAMP | | | locale | tinyint(3) unsigned | NO | PRI | NULL | | +--------+---------------------+------+-----+-------------------+-------+
All fields are pretty self-descriptive except “input” maybe. This field represents the input device that was used to type the alphabet for example typing using computer keyboard is probably much faster than typing on your iphone or android. iPad is probably somewhere in between. This feature is for future use so for now I only collect the data but I don’t show it. The triplet <id, input, locale> is used as primary key.
I’m no DBA, maybe I should have used id as primary and “unique” constraint for the triplet but the idea is a player can have only one score per language (locale) and input device. Back to my scoreboard query- I need to select top 5 players per language. After playing a while with basic queries I figured out it can’t be done with simple “group by locale” query as I can’t limit number of results per group nor explicitly tell it to take top X scores. If I needed only best score per language I could’ve simply used:
select id, min(score), locale from scores group by locale
Unfortunately there is no similar function to get N minimal scores. At this point I had two choices: either googling for solution (having one query getting everything done) or using three queries – one per language and programmatically combine the results. Maybe I should have chose the latter. I admit, I didn’t compare overall performance. However I didn’t like the idea of my DB being queried three times each time the scoreboard is refreshed (which happens quite often). Another possible solution is to show only current language scoreboard (therefor having only one query for the current language). This solution was unacceptable because it means new AJAX request each time a player switches between languages. Since most expected players are multilingual, that’s more requests to process by the web server.
Once again, I did not compare overall performance using statistical usage information so it is possible that I was wrong regarding which tradeoff is preferable. Anyhow my decision was to use one AJAX request to get the whole scoreboard and that’s what this post is really about. So I googled for solutions and I found a couple. Some use user-defined variables for the inner counting, some use relatively unreadable (at least for me) queries with inner/outer/left joins. The most readable query I found was:
select s1.id, score, locale from scores s1 where 5 > (select count(*) from scores s2 where s1.locale = s2.locale and s1.score > s2.score) group by locale, s1.id order by locale, score
Which can be interpreted as: for each score count how many scores are lesser (in the same language); then select it only if amount of lesser scores is less than 5 (meaning the score is in top 5); finally group results by language and then by score. We must group by score because otherwise we’ll get only one result per language. Pretty readable in my opinion. As for performance it does make new sub-query for each row so maybe it takes a little more cpu or time but there is no huge Cartesian multiplication.
But wait. Remember the “input” field? it is possible that same player has more than one score per language and if both of his scores are in top 5 I don’t want to show them both. For current version of the game I want to show only best of his scores and count it as one result for the scoreboard. On first thought replacing “score” with “min(score)” should do the job but it doesn’t because it will only show best of his scores but count both scores when counting amount of lesser scores. So my next move, I figured, is replacing “scores” with sub-query that returns filtered scores table – only the best score per player and language. The sub-query would look like:
select id, min(score) as score, input, locale from scores group by locale,id
Then of course I need to exclude blacklisted users who tried to tamper with the system so:
select id, min(score) as score, input, locale from scores where id not in (select id from blacklist) group by locale,id
And since I’m on shared hosting and my hosting provider doesn’t allow to create or use views, I must literally take this query and replace every instance of “scores” on the first query with it. The result is so ugly I won’t even write it. Then I asked for advice from a DBA friend of mine. She told me to try using analytic functions and gave me some examples for Oracle DB. When I looked for analytic functions on Mysql I found this great article “Emulating Analytic (AKA Ranking) Functions with MySQL” from O’REILLY. In fact they have such a good explanation of analytic functions I won’t even repeat it (if you’re unfamiliar with the subject you can read there).
Following this article, I came up with this query:
select * from (select e.locale, e.id, e.score, find_in_set(e.score, x.scoreslist) as rank from (select locale, group_concat(score order by score) as scoreslist from (select locale, min(score) as score from scores where id not in (select id from blacklist) group by locale, id) as k group by locale) as x, scores as e where e.locale = x.locale) as z where rank <= 5 and rank > 0 order by locale, rank
At first glance it might look ugly but let me assure you it’s far more readable and apparently more efficient than the first alternative. The innermost sub-query (k) would return the filtered scores table I discussed before – best score per player and language. Then we use group_concat to form table (x) that looks like (e.g.):
+--------+---------------------------------+ | locale | scoreslist | +--------+---------------------------------+ | 1 | 1.75,2.129,2.85,6.34,9,10,11,12 | | 2 | 2.185,4.12,8.32 | | 3 | 2.4 | +--------+---------------------------------+
Now we have scores list per language. We make Cartesian multiplicity with scores table (e), joining them by language. Multiplicity product isn’t huge as number of languages are fixed and small, so we’re still in the same order of magnitude. The result is original list of scores but now each row additionally has scores list for it’s language. Next, we give the score a rank according it’s location on the list and filter so only ranks in range 1-5 will return. How simple is that?
Problem solved. Note: group_concat has maximal result length (by default 1024b). Although it can be increased the method presented won’t work if you need all your records sorted and grouped but for the purpose of top N assuming N is relatively small it works very well.
All in all, for my “a-z” game it probably doesn’t matter which of the methods I’d have used but it was quite educative and it’s good knowledge for real projects to come. As said before, DB is not my field of expertise so I’d appreciate if someone experienced would comment on things I wrote or shed some light on the things I didn’t.
Do you remember the good old “how fast can you type a-z” contest ? Well, I recently needed to study jQuery and I always wanted to do develop using facebook’s API so I made this little webapp for old times’ sake. I think it’s pretty cool!
I’m a big fan of open source. Not only in context of open source vs. proprietary discussion but also with it’s spirit. I truly believe that by sharing ideas, information and communicating with each other we can make this world a better place. I could make the rest of this post all about how good it is, give tons of examples for wonderful things that exist only thanks to open source…
However, sharing/talking doesn’t make things work. Someone has to turn those ideas into actual products and once they’re made, someone has to maintain and improve them. Who is this someone and why does he do it? for years I didn’t care. As long as the universe provide me with great free software, I thought, I should use them and so I enjoyed the alternatives to Microsoft products (when they dominated the earth) and I surely enjoyed the LAMP stack when I built my first web site and till this day I still do. In fact these days most of the software I use is open source whether it’s my desktop, development environment, graphic tools, office suite, this blog, or even Cydia.
What did I give back? not much. Well, I did some beta testing here and there, little bug reports, I recently wrote plugins for FRD (the best download manager for files hosted on public file-hosting services) and I even tried to promote “day against DRM” so I probably did more than the average end user (who doesn’t care at all) but still, not much and I always had a personal interest. Usually software or features I desired and sometimes ideologically.
Does it make me a bad person? selfish? On the contrary. Why would anyone contribute to something he has no interest in? The fact I made contribution (even the smallest) means I care, and I do. If I didn’t embrace open source and use it on a daily basis, I probably wouldn’t have done anything to help it and I think here lies it’s beauty: software is created and publicly released – for the sole purpose of others enjoyment. If it’s any good it gets common and widespread. Because it has many users, bugs are revealed and they’re either reported or fixed by the end users themselves. Then a fixed version is released and accessible to everyone.
And it’s OK not care, and it’s OK not to contribute. After all, that’s what made me choose it. That’s what made me who I am and I guess only after I’ve seen this wealth world of freeness it made me realize it’s real value which is far more than just a software that doesn’t cost money. Sadly (or not), corporates realized it too and embraced it as part of their business model. They either open-source their product and sell professional services or open-source a crippled version of their product and sell the full version. Both ways are legit but they raise new problems: rival companies with products that do the same would try to plump different communities, making those communities smaller and less effective and why would anyone voluntarily debug/beta test/fix corporate’s software could he get paid for??
So who are really those people who make open source work? they’re the good guys I guess, who make things better for a greater good, little by little. Whether we care or not, we all use their products (directly or indirectly) and for that we shall be thankful. I appreciate them all, and I hope people (myself included) would get more involved and spread the openness spirit!
It seems that only yesterday I moved from blogspot to wordpress.com, and now I’m moving again. Yes! I finally have my own domain, which is quite exciting!! The new url if you haven’t noticed yet:
http://frishit.com
All RSS followers, please update feed URL. Actually if you read this from your RSS client, it means that you don’t have to update it coz’ I write this post on my new domain, but one year from now the DNS forwarding won’t work, keep that in mind.
If you can’t read this you do have to update the feed URL… :)
This tutorial explains how to setup/troubleshoot XMPP server with BOSH. I’m not getting into what is XMPP and what is it good for. The first two paragraphs are theoretical. XMPP is stateful protocol in a client-server model. If web application needs to work with XMPP a few problems arise. Modern browsers don’t support XMPP natively, so all XMPP traffic must be handled by program running inside the browser (JavaScript/Flash etc…). The first problem is that HTTP is a stateless protocol, meaning each HTTP request isn’t related with any other request. However this problem can be addressed by applicative means for example by using cookies/post data.
The second problem is the unidirectional nature of HTTP: only the client sends requests and the server can only respond. The server’s inability to push data makes it unnatural to implement XMPP over HTTP. The problem is eliminated if client program can make direct TCP requests (thus eliminating the need of HTTP). However, if we want to address the problem within HTTP domain (for example because Javascript can only forge HTTP requests) there are two possible solutions, both require “middleware” to bridge between HTTP and XMPP. The solutions are “polling” (repeatedly sending HTTP requests asking “is there new data for me”) and “long polling”, aka BOSH. The idea behind BOSH is exploitation of the fact that the server doesn’t have to respond as soon as he gets request. The response is delayed until the server has data for the client and then it is sent as response. As soon as the client gets it he makes a new request (even if he has nothing to send) and so forth.
BOSH is much more efficient, from server load’s point of view and traffic-wise. In this tutorial I set up Openfire XMPP server (which also provides the BOSH functionality) with JSJaC library as client, using Apache as web server on Ubuntu 10.04. Openfire has Debian package and as such, installation is fairly easy. Just download the package and install. After installation browse to port 9090 on the machine it was installed on, and from there it’s web-driven easy setup. If you choose to use MySQL as Openfire’s DB make sure to create dedicated database before (mysqladmin create).
After initial setup, I wasn’t able to login with the “admin” user. This post solved my problem, openfire.xml is located at /etc/openfire (if you installed from package), you will need root privileges to edit it and then restart Openfire (sudo /etc/init.d/openfire restart). Other than that everything worked fine. Openfire server (as well as all major XMPP servers) provides the BOSH functionality, aka “HTTP Binding” aka “Connection Manager”. By default it listens on port 7070, with “/http-bind/” (the trailing slash is important).
To make sure it works (this is the part I couldn’t find anywhere, that’s why it took me long time to resolve all problems) I used “curl”, very handy tool (sudo apt-get install curl). To test the “BOSH server”:
# curl -d “<body rid=’123456′ xmlns:xmpp=’urn:xmpp:xbosh’ />” http://localhost:7070/http-bind/
Switch “localhost” with your server name, notice the trailing slash. Expected result should look like:
<body xmlns="http://jabber.org/protocol/httpbind" xmlns:stream="http://etherx.jabber.org/streams" authid="2b10da3b" sid="2b10da3b" secure="true" requests="2" inactivity="30" polling="5" wait="60"><stream:features><mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl"><mechanism>DIGEST-MD5</mechanism><mechanism>PLAIN</mechanism><mechanism>ANONYMOUS</mechanism><mechanism>CRAM-MD5</mechanism></mechanisms><compression xmlns="http://jabber.org/features/compress"><method>zlib</method></compression><bind xmlns="urn:ietf:params:xml:ns:xmpp-bind"/><session xmlns="urn:ietf:params:xml:ns:xmpp-session"/></stream:features></body>
Once verified, we can continue with the next step. Since the client is Javascript based, all Javascript restrictions enforced by the browser applies. One of these restrictions is “same origin policy“. It means that Javascript can only send HTTP requests to the domain (and port) it was loaded from and since it is served on HTTP (port 80) it can’t make requests to port 7070. Solution: Javascript client will make requests to the same domain and port. The requests will be forwarded locally to port 7070 by Apache. I guess you can use the same method to forward even to a different server but I didn’t try. I configured forwarding following this post but there is probably more than one way to do it.
Add to /etc/apache2/httpd.conf the following lines (root privileges needed):
LoadModule proxy_module /usr/lib/apache2/modules/mod_proxy.so
LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so
Then, add to /etc/apache2/apache2.conf the following lines (root privileges needed):
ProxyRequests Off
ProxyPass /http-bind http://localhost:7070/http-bind/
ProxyPassReverse /http-bind http://localhost:7070/http-bind/
ProxyPass /http-binds http://localhost:7443/http-bind/
ProxyPassReverse /http-binds http://localhost:7443/http-bind/
Now restart the Apache (sudo /etc/init.d/apache2 restart) and make sure it starts properly. To verify the forwarding works, use the same curl method, this time as request to the Apache:
# curl -d “<body rid=’123456′ xmlns:xmpp=’urn:xmpp:xbosh’ />” http://localhost/http-bind
The result should be the same as before. If it doesn’t work there is problem with the forwarding. Update: if you get “403 Forbidden” error from Apache, this may help (thanks Tristan). Once it is working, server side is ready. On the client (in my case JSJaC), you should specify to use BOSH/HTTP Bind “backend” (as opposed to “polling”). For “http bind” url just use “/http-bind” and everything should work. Notice that if you open the client locally on your desktop (not served by the Apache) it won’t work because of the “same origin policy” mentioned before.
I hope you find this tutorial useful, it sure could have helped me… :)
Today was my first time I used iPhone tethering. Tethering means connecting computer to the internet using iPhone’s cellular connectivity (3G/EDGE). The iPhone connects to the computer via USB cable or Bluetooth and uses as a cellular modem. It’s useful when you’re away with a laptop and no wireless hot spots around.
There are many tutorials out there, however I found most of them misleading (advising to use third party software or jailbreak your iPhone – both unnecessary) and none of them for Linux. This tutorial is really easy and simple.
Step 1 – Enable internet sharing on your iPhone
On your iPhone goto Settings -> General -> Network -> Internet sharing -> On. If “Internet sharing” isn’t shown, open Safari and browse to: http://help.benm.at. On this webpage click on “Tethering”, then select your country and your mobile carrier. Accept the profile and reboot your iPhone. After the reboot “Internet sharing” option is supposed to be shown. In my case, my carrier wasn’t on the list so I generated a custom file (the last option). The APN/User/Password were easily found using Google. However, “Internet sharing” stayed hidden. The funny thing is after I removed the custom profile it magically appeared. My iPhone version is 3.1.2 and it worked. I noticed they have warning for 3.1.3 so watch out if you got it.
Step 2 – Install necessary software on Ubuntu
As always, I provide instructions for Ubuntu. The driver you need is called “ipheth” and it depends on “libimobiledevice”. Now, you can find them both on Lucid’s official repositories. However, only version 0.9.7 of libimobiledevice is there, and while it may be enough for tethering, I recommend using the newer version which can be obtained from Paul McEnery PPA (for Karmic it’s still version 0.9.7):
# sudo add-apt-repository ppa:pmcenery/ppa
Executing: gpg –ignore-time-conflict –no-options –no-default-keyring –secret-keyring /etc/apt/secring.gpg –trustdb-name /etc/apt/trustdb.gpg –keyring /etc/apt/trusted.gpg –primary-keyring /etc/apt/trusted.gpg –keyserver keyserver.ubuntu.com –recv 3AE22276BF4F39C8D6117D7F4EA3A911D48B8E25
gpg: requesting key D48B8E25 from hkp server keyserver.ubuntu.comgpg: key D48B8E25: public key “Launchpad PPA for Paul McEnery” imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
# sudo apt-get update
Now, install the driver (and it’s dependencies, which include the library):
# sudo apt-get install ipheth-utils
Load the driver into running kernel
# sudo modprobe ipheth
At this point if you enter “dmesg” the last line should show something like “usbcore: registered new interface driver ipheth”
Step 3 – Plug your iPhone using USB cable
As soon as you connect your device a new interface should be plumbed, configured and ready to be used (in case you use network manager). You can get list of your interfaces with “ifconfig -a”. On your iPhone there suppose to be label saying “Internet Tethering”. If you don’t use network manager the “Internet Tethering” is triggered when you make DHCP request on that interface (for example, using “dhclient”).
That’s it. As promised, easy and simple. Thanks to Paul McEnery there are plenty of other cool iPhone related stuff you can install from his PPA (on Lucid).
Enjoy!
I always needed test environment for my destructive experiments. Sandbox, if you want. A place I can do whatever I want without worrying about the consequences. I’m tired of destroying my operating system. I’m talking about these times when I modify and rebuild kernel modules/kernels/libraries for experimental purposes, such as making KVM support Mac OS X, making netfilter/iptables module support advanced connection tracking, etc. This time, I needed to patch rtl8178 kernel module (wireless driver) to allow packet injection. I decided to solve the problem once and for all.
The problem, if you didn’t follow, is that these kind of changes might have destructive influence on the running operating system. There are basically two different approaches to address this problem:
1. Take snapshot, or “point in time” of the operating system and after experiment is finished return to that point.
2. Experiment in a “sandbox” where no one cares what gets ruined and it can’t harm the operating system.
I prefer the second approach because when you roll back to certain point in time everything rolls back and I’m always doing more than one thing so I don’t want my other stuff to roll back as well. Besides, that kind of solution is always “heavy” as the whole system needs to be compared to the way it was before. Sure, you can make use of timestamps and more sophisticated comparisons to make it quicker but unless you use real snapshots (such as in LVM or ZFS), it’s not ideal.
So we go with the sandbox approach. Once again we have a few options. We can create chroot jail but it might have problems accessing physical devices (I never tried actually) and I don’t really like the way the chrooted environment is created. We can use “live” operating system that runs directly from real memory (no changes are made to disk) but then only one operating system can run simultaneously which means that during the potentially destructive session my normal operating system won’t be available, which is exactly when I might need it.
Virtual machine seems like an adequate solution. It fulfills the sandbox demand, it’s easy to setup and some virtualization platforms even got snapshot abilities. The only problem is it doesn’t always have access to physical devices. However, in this case the device is usb based and therefor accessible from within the virtual machine as well. For virtualization platform I chose VirtualBox (3.1.8). Vmware doesn’t support my processor’s virtualization capabilities so it’s out of the game and VirtualBox performs better than KVM, on my computer at least. Especially when it comes to I/O performance. Now I need to choose adequate operating system for the virtual environment.
I want the operating system to be as minimal as possible, no need for fancy graphical environments, office suite, web browsers, etc… I started looking for candidates but man, there are so many distros out there! I would have simply used Ubuntu because I’m used to it, it has huge software repositories and it would be very similar to my actual operating system, the only problem is it comes with loads of unnecessary software. I considered installing Debian exactly for that reason and then I found out about Ubuntu’s JeOS (thanks Amir). JeOS stands for just enough operating system and it’s simply Ubuntu’s core. It seemed perfect for my needs. JeOS comes with Ubuntu server edition.
The installation is fairly simple and quite fast. On the first installation screen you need to press F4 and select “Install a minimal system” as shown. There is also a Vmware/KVM optimizied version (“Install a minimal virtual machine”) but it’s not VirtualBox optimized so I chose the minimal server option. The rest of the installation is following simple screens. When I had the option to choose software packages I chose the basic Ubuntu Server and OpenSSH.
Boot time is also impressive, leaving us with old-school tty login screen. Cool. Next thing is to install VirtualBox guest additions to make interaction smoother and better. Before it is installed few prerequisites has to be installed first:
# sudo apt-get install gcc xserver-xorg-core
It will install gcc and X.Org, the graphical environment server. It won’t install window manager (such as Gnome or KDE), graphical login (GDM or KDM), etc… just the core X server. To install VirtualBox Guest Additions, from the virtual machine menu: Devices -> Install Guest Additions. Then:
# sudo mkdir /media/cdrom
# sudo mount -o ro /dev/sr0 /media/cdrom
# sudo /media/cdrom/VBoxLinuxAdditions-amd64.run (depends on your platform)
Make sure you get no errors and viola! our new test environment is ready. I took a snapshot (Machine -> Take Snapshot) so I can always return to this basic point. What if you do want lightweight window manager ? I used FluxBox but you can install whatever you like. To install fluxbox:
# sudo apt-get install fluxbox
# sudo apt-get install xinit x11-utils eterm xterm
# echo fluxbox > ~/.xinitrc
The packages eterm, x11-utils allows you to set FluxBox background with “fbsetbg” command. xterm is the standard terminal emulator (I guess one may argue with that but it’s my favorite anyway). None of them is necessary. To load the graphical environment:
# startx
That’s it. I also managed to recompile rtl8187 kernel module but it’s out of the scope of this post. Enjoy your new test environment!