TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Tips & Tricks (http://www.talkphp.com/tips-tricks/)
-   -   Audioscrobbler (Last.FM) API: Determining What's An Album (http://www.talkphp.com/tips-tricks/4504-audioscrobbler-last-fm-api-determining-whats-album.html)

Wildhoney 06-08-2009 06:50 PM

Audioscrobbler (Last.FM) API: Determining What's An Album
 
Last.FM's API may be the most informative out there, but its data still has issues, such as duplicated tracks on a particular album, that differ slightly -- such as one added exclamation mark. Being able to filter out those tracks may be easy, but one problem I did face when dealing with the API today was determining what constitutes album.

Unfortunately there is no easy way to determine what constitutes an album and what's a single (SP, EP). The reason for this is clear, searching for the artist "Oasis" returns a plethora of albums, the majority of which aren't albums.
Last.FM API: "Oasis"
When I have used the API in the past, I have decided what's an album based off of the MBID field. The simple logic worked like so, if the MBID is empty then it is not an album, otherwise class it as an album. This was all fine and dandy until you look at the smaller bands, take for example the post-rock band, Maybeshewill.
Last.FM API: "Maybeshewill"
As Maybeshewill is a small band, there are no MBIDs for any of their albums, and therefore the aforementioned logic reveals its flaw.

Today I decided that there must be a way based on the data available in that API to determine what is an album. The logic I arrived at seems to work for all the artists I have tried. Of course it's not perfect because this data is extremely variable, and nothing can be taken for granted.

What I did was the following, in pseudo code:
  • Get the "reach" value of the first album;
  • Get the "reach" value for any subsequent;
  • Calculate the percentage of the current album compared to the most popular;
  • If the current is less than 10% then ignore it.

The reach value is, in essence, how popular that particular album is. Naturally, non-albums are less popular than albums. 10% is a value that was arrived at through some inspection of the break down of the percentages.

To exemplify, if the top album has a reach of 65,000, and the second album in the list has a reach of 40,000, we would use the following equation:

Code:

(40,000 / 65,000) * 100 = 61.5%
Whereas the smaller so-called albums would have a much smaller reach value and therefore, the majority, not all, will fall below the 10% line.

In PHP code this could be shown as the following:

php Code:
foreach ($pAlbums->album as $pAlbum)
{
    if (!isset($iHighestReach))
    {
        $iHighestReach = (int) $pAlbum->reach;
    }
   
    if (($pAlbum->reach / $iHighestReach * 100) < 10)
    {
        continue;
    }
   
    /* Process the album... */
}

You can of course alter the 10% value as this is purely based off of my personal judgement. Don't expect it to be perfect. It will miss some that are albums, and it will include some that aren't albums.

Without making subsequent calls to clarify, however, this is perhaps the best, at least insofar as I can see, way to determine what is and what isn't an album.


All times are GMT. The time now is 02:20 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0