View Single Post
Old 06-08-2009, 06:50 PM   #1 (permalink)
Wildhoney
La Vida es Sueño
Advanced Programmer Top Contributor 
 
Wildhoney's Avatar
 
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
Wildhoney is on a distinguished road
Box Audioscrobbler (Last.FM) API: Determining What's An Album

Last.FM's API may be the most informative out there, but its data still has issues, such as duplicated tracks on a particular album, that differ slightly -- such as one added exclamation mark. Being able to filter out those tracks may be easy, but one problem I did face when dealing with the API today was determining what constitutes album.

Unfortunately there is no easy way to determine what constitutes an album and what's a single (SP, EP). The reason for this is clear, searching for the artist "Oasis" returns a plethora of albums, the majority of which aren't albums.
Last.FM API: "Oasis"
When I have used the API in the past, I have decided what's an album based off of the MBID field. The simple logic worked like so, if the MBID is empty then it is not an album, otherwise class it as an album. This was all fine and dandy until you look at the smaller bands, take for example the post-rock band, Maybeshewill.
Last.FM API: "Maybeshewill"
As Maybeshewill is a small band, there are no MBIDs for any of their albums, and therefore the aforementioned logic reveals its flaw.

Today I decided that there must be a way based on the data available in that API to determine what is an album. The logic I arrived at seems to work for all the artists I have tried. Of course it's not perfect because this data is extremely variable, and nothing can be taken for granted.

What I did was the following, in pseudo code:
  • Get the "reach" value of the first album;
  • Get the "reach" value for any subsequent;
  • Calculate the percentage of the current album compared to the most popular;
  • If the current is less than 10% then ignore it.

The reach value is, in essence, how popular that particular album is. Naturally, non-albums are less popular than albums. 10% is a value that was arrived at through some inspection of the break down of the percentages.

To exemplify, if the top album has a reach of 65,000, and the second album in the list has a reach of 40,000, we would use the following equation:

Code:
(40,000 / 65,000) * 100 = 61.5%
Whereas the smaller so-called albums would have a much smaller reach value and therefore, the majority, not all, will fall below the 10% line.

In PHP code this could be shown as the following:

php Code:
foreach ($pAlbums->album as $pAlbum)
{
    if (!isset($iHighestReach))
    {
        $iHighestReach = (int) $pAlbum->reach;
    }
   
    if (($pAlbum->reach / $iHighestReach * 100) < 10)
    {
        continue;
    }
   
    /* Process the album... */
}

You can of course alter the 10% value as this is purely based off of my personal judgement. Don't expect it to be perfect. It will miss some that are albums, and it will include some that aren't albums.

Without making subsequent calls to clarify, however, this is perhaps the best, at least insofar as I can see, way to determine what is and what isn't an album.
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney
Wildhoney is offline  
Reply With Quote
The Following User Says Thank You to Wildhoney For This Useful Post:
yunohoo (04-21-2012)