Filtering users by their IP address may seem like a good idea when you're beginning PHP, even coders that should know better have been caught giving too much trust to an IP. I remember the first PHP project I did, entitled rMetal - which was a website dedicated to various bands from the metal genre in my younger days, I didn't know about sessions and I didn't care to read up, either. As a consequence my system's login was constructed by if your IP address is the same as the one in the database, you're OK. Boy, was I in for a shock!
Today I ensure that no trust is placed on the user's IP address. It cannot be trusted. For once, where I live, the ISP uses web proxy servers and so everybody in my local area - stretching 5+ fairly large sized towns, all have the same IP address when they visit a website, whereas they have unique IPs for other services. This is partly for security and partly to save the tight ISP some bandwidth - they simply load in pages from the web proxy unless the target page has been modified.
Take the following PHP code into consideration and place it at the forefront of your mind:
PHP Code:
var_dump($_SERVER['REMOTE_ADDR']);
I'm sure the majority of you recognise what this code does. It will echo out my IP address with its data type and length. When I run the script it gives me:
PHP Code:
string(11) "127.0.0.1"
Note: I'm not really 127.0.0.1, stops any of you little rats getting my real IP, though!
A conservative estimate would be that 5,000 other households have this identical IP address that I have, and thus relying on the IP address is clearly a big no no! "What's the solution?" you might ask. Well...
...The honest truth is there is no solution. We can never be 100% certain that the member visiting our website is truly unique. Sure we have cookies, IP addresses and sessions
(related to cookies), but these are not reliable. Perhaps the best way to identify and tag a visitor is by hashing some information together and constructing them a fingerprint identity.
If you remember from
this article, every single header sent from the client to the server is optional. Put another way, the client's browser decides whether or not to set it. The HTTP protocol only expects the page request. It can extract the client's IP address from the TCP/IP packet so although this can be spoofed, if the client wants a response, which the request will need to complete TCP's 3-way-handshake, then spoofing it would be a monumental exercise in futility.
The
HTTP_USER_AGENT is a HTTP parameter that is set in the HTTP header and extracted by PHP and placed into the
$_SERVER predefined array. This parameter contains information on the user's browser and operating system. Mine being:
Quote:
|
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
|
Naturally, many people will share the same
HTTP_USER_AGENT as me, but we cut the chances of a duplicate if we concatenate the IP address to it and then hash it. Something like the following would give us more possibilities:
PHP Code:
$szFingerprint = md5($_SERVER['HTTP_USER_AGENT'] . $_SERVER['REMOTE_ADDR']);
This gives me the following fingerprint identity:
Quote:
|
a54aa1ff349280792a8ef780697f06ae
|
See what we've done? We've increased the chances of creating a duplicate fingerprint by many times. Now for the same fingerprint to be generated, the user would have to:
- Have the same version of Firefox
- Have the same version of Windows
- Have the same IP address
- Have the same browser language
There is of course a good chance that if me and another user share point 3 then we will also share point 4, but even still, you can see our chances have increased significantly since naively relying on the IP address alone.
Sadly, as previously touched upon, the header parameters are options and so if
HTTP_USER_AGENT is empty then we'll be merely hashing our IP address on its own which is a pointless exercise. Unfortunately, there's very little we can do here other than set cookies to rat them out. The good news is that that almost every browser, and certainly every new browser, sets a
HTTP_USER_AGENT and so if identifying a user is crucial, preventing users from accessing the website who do not have
HTTP_USER_AGENT set is a path you may consider:
PHP Code:
if(!isset($_SERVER['HTTP_USER_AGENT']))
{
die('You must have HTTP_USER_AGENT set.');
}
$szFingerprint = md5($_SERVER['HTTP_USER_AGENT'] . $_SERVER['REMOTE_ADDR']);
As you can see, there are both advantages and disadvantages. I load up the same website and my fingerprint is now different because I am using Internet Explorer as opposed to Mozilla Firefox:
Quote:
Internet Explorer: 36f0679b96c335e5f8694a9b8f957f61
Mozilla Firefox: a54aa1ff349280792a8ef780697f06ae
|
They are clearly very different from one another, and no matter which browser I use, I will be seen as a different person even though I have the same IP address as I did 5 seconds ago. To conclude, everybody is unique on the Internet if they wish to be, this is where sessions stepped in to introduce some law and order in a lawless and orderless environment. Denying individuals without a
HTTP_USER_AGENT would more often than not be a lot less than the individuals with Javascript disabled, which is around 6% according to
W3Schools' browser statistics.