The Sounds of Science

June 16th, 2010

Last Friday, hsarik pointed out an interesting web site: Echo Nest.  They provide a web service that allows you to analyze and remix music.  The API also can provide information (meta-data) about music, artists, songs etc.  and has Python bindings.  If you’ve seen the “More Cowbell” website where you can upload an mp3 and have more cowbell (and more Christopher Walken) added to it, well that site uses Echo Nest and if you download the python bindings for their API, you can see the script that adds the sounds.  Personally, I’m fond of “Ob-la-di, Ob-la-da” with 80% cowbell and 20% Christopher Walken.

I started playing with the API and as a first cut thought it would be neat to use the “get_similar” function.  So for each artist, you can get the top N similar artists.  Now where can I get a list of artists I like?  Well, I could type ‘em in, but that sucks.  So I wrote a small program which:

  1. Opens the database on my iPod (or a directory of mp3 files)
  2. Finds each artist by either reading the iPod db or looking at the id3 tags in all of the files
  3. For each artist, add a node to a graph where the area of the node is proportional to the number of songs that artist has on the iPod (or in the music folder)
  4. For each artist, finds the top 50 similar artists
  5. For all of the similar artists that are in my collection of artists, add a graph edge between the two nodes
  6. Plot the graph

What can I say, I’ve been working on a fair amount of graph-theory at work recently.  So after processing my iPod, I came up with the following graph of my current music (click to embiggen):

Okay, that’s pretty cool.  Almost completely illegible, but cool.  FWIW, the graph has 15 connected components, unfortunately, 13 of them are “singles” (not connected to anything), with one pair (Louis Armstrong paired with Louis Armstrong and Duke Ellington).  Fortunately, the graphing tool I use (igraph), has built in tools for doing community analysis (using the leading eigenvector method), i.e., we can automatically find tightly coupled subgraphs.  A few examples from the 25 or so communities:

which arguably correspond to “Indie,”  “Classic Rock,”  “Jam Bands,”  “Guitar Gods,” and “Alternative.”  If I processed my complete music database, I suspect we would wind up with several other communities, e.g., Blues.  But since Robert Johnson is the only blues I’ve got on there right now… he’s in a class by himself.

I suppose it goes w/o saying, that my musical tastes aren’t everyone’s and that if you don’t like my musical tastes, you can keep it to yourself or go DIAF :-)

So, what’s next?  I was talking with M from my office and we’ve come up with another interesting project for the Echo Nest API.  This one a) uses the audio analysis functions, and b) if we do it right might cause someone to send us a cease and desist.  So, win all the way around.

The End

Photography workflow

June 3rd, 2010

Four years ago, I made the switch to digital SLR photography.  The primary reason was the workflow.  When I shot slide film, I would have to get the film developed, look at each image, scan the ones I liked, correct the color balance and then manually remove the dust spots from the scanned images.

When I first got the digital camera, the workflow became: auto-correct the color balance using the Nikon’s color profile, then select the images I liked.  Great!

Unfortunately, over the years, my SLR has gotten dust on the sensor, because I was doing what Nikon said and not mucking with the sensor to try to clean it.  So, first thing is that I should ignore Nikon and actually clean the sensor.  But the second thing is that this has really screwed with my workflow.  Last year, after identifying the “good” images, I had to manually go through them and use the Heal tool in the GIMP in order to get rid of a few dust spots.  Well, dust is cumulative and this year it was worse than ever.  In particular, the dust was more noticeable because I was shooting a lot of waterfalls… long exposures with a small aperture – dust city.  Take a look at the following:

To some extent or another, that’s on every single image I took while K and I were on vacation.

I could repeat my old workflow, but that would take days.  New idea:  there is a tool in the GIMP called the Smart Remove Selection.  It takes a selected bit of the image and replaces it with textures from the surrounding area.  It’s comparable to Photoshop’s content-aware fill.  So, if I can select all of the visible dust, I can clean it at one time.  But that’s still slow.

Instead, I selected all of the dust from the image above.  Grew the selection by 10 pixels, converted it to a path and then saved the path as an SVG file.  Since the dust is at the same location in each image, a single dust file is relevant to all of my images.

Now all I have to do is to open an image, import the path, convert the path to a selection and apply the smart remove.  That’s a little better, but still means that I have to touch each file manually.

Enter GIMP scripting.  Last night, I wrote a script that takes a file glob, converts it to a list of files, and for each file automatically removes the dust and color corrects the image.  It still takes about a minute per file, but it’s completely automated.  Unfortunately, the first version of the script only handled horizontal images.  But since I always turn the camera clockwise when I shoot vertically, I was able to modify it to rotate the image appropriately, apply the dust removal and then rotate the image back i the height of the image is greater than the width.

The results are pretty great for a system I can just run over night:

The End

What, ahem, WTF is wrong with Arizona?

April 27th, 2010

Presumably, most people have now heard that Arizona has passed a new law respecting immigration enforcement.  Reactions, as one might expect, are mixed depending on one’s thoughts on immigration, and more specifically, the potential for illegal immigration.  Mexico has issued a travel warning for its citizens, advising them not to visit the state.  Various AZ mayors have decried the law.  Some right wingers love it, while others hate it, and of course George Will is still an asshole.  And finally, even though the law doesn’t go into effect for another three months or so, we can already see what the future will be like for Hispanics in Arizona.

Honestly, I can’t believe that we’re talking about a state where you might expect law enforcement to request your papers.

I did have a few questions, including what exactly does the law require?  And how will it be enforced?  In a nutshell, the law:

Requires officials and agencies to reasonably attempt to determine the immigration status of a person involved in a lawful contact where reasonable suspicion exists regarding the immigration status of the person, except if the determination may hinder or obstruct an investigation.

Okay, so, how does one form a reasonable suspicion?  Well, good for Arizona, the law further:

Stipulates that a law enforcement official or agency cannot solely consider race, color or national origin when implementing these provisions, except as permitted by the U.S. or Arizona Constitution.

So, I don’t know what is permitted by the Arizona Constitution, maybe all of those forms of profiling, maybe none.  But one final question, given that the mayor of Phoenix doesn’t support the law, how do you guarantee that it gets enforce?  Well, a citizen can sue if there’s a policy that doesn’t support enforcement:

Allows a person who is a legal resident of this state to bring an action in superior court to challenge officials and agencies of the state, counties, cities, towns or other political subdivisions that adopt or implement a policy that limits or restricts the enforcement of federal immigration laws to less than the full extent permitted by federal law.

Requires the court to order any that a violating entity pays a civil penalty of at least $1,000 and not to exceed $5,000 for each day that the policy has remained in effect after it has been found to be violating these provisions.

A few thoughts:

  1. It’s not clear to me how one forms a reasonable suspicion about the immigration status of the person, except given their ethnic background
  2. It’s not clear to me that ethnic background is even restricted as a category for consideration in the law, based on Latinos being Caucasian and questions about what is permitted under Arizona’s constitution
  3. Given #1 and #2 above, I don’t think people realize how ineffective ethnicity is in determining legal status

Fortunately, statistics gives us a good answer to #3.  For the following, let’s consider that L indicates Latino, and I represents illegal.

Bayes rule tells us that the probability of being illegal given that you are Latino [ Pr(I | L) ] is the probability of being illegal (the prior, Pr(I)) times the probability of being Latino given that you are Illegal (likelihood, Pr(L|I)) divided by the evidence or the probability of being illegal given Latino and given not Latino [Pr(L|I)*P(I) + Pr(L| not I)*Pr(not I)]

so, Pr(I|L) = Pr(L|I) * Pr(I) / (Pr(I|L) + Pr(I| not L))

We can quantify this somewhat, from StateMaster

The population of AZ = 5,939,292

Legal Hispanic/Latino population of AZ = 1,803,377

Estimated number of illegals in AZ = 283,000

If we assume that all illegals are Hispanic, then:

P(I) = 283,000 / 5,939,292 = .04765

P(L|I) = 1.0

P(L| not I) = 1,803,377 / 5,939,292 = .30364

So, the probability of being illegal given that you are Latino is:   .14147 or ~14%.  Which in my mind is no reason to form a suspicion.  Hell, more that 14% of the population are pot smokers, you wouldn’t want to give the police authority to stop and arrest everyone to find that subset who are.

CAVEATS AND NOTES

  • The data above are from 2000 and may not be current; however, illegal immigration shows a strong economic correlation and the economy is down compared to the boom year of 2000, so the numbers are probably in the right ballpark
  • I don’t believe that Pr(L  | I) = 1.0.  This says that all illegals are Latinos.  That’s bull.  There are plenty of Asian and European illegals.  Adjusting this probability down will significantly decrease Pr(I | L).  For example, assuming P(L|I) = 0.8 results in P(I|L) = ~.116.
  • The above analysis assumes that Bayes’ law is true, but if it isn’t then we’re all seriously screwed.
  • Finally, I would be uncomfortable with racial profiling even if Pr(I|L) > 0.5.  It’s just not America when the cops stop and ask you for your citizenship papers.  I can think of a few places where that did occur, but won’t risk the Godwin retraction by invoking them.

So, what to do?  I was challenged a few years ago as to my solution to the illegal immigrant “problem.”  My first response is that you are assuming it’s a problem.  After all, studies have shown that low-wage legal and illegal immigrants actually grow an economy.  Moreover, since many pay into Social Security and Medicare, without receiving benefits, that helps those programs.  OTOH, it’s not fair to have them pay in without receiving services; moreover, the social safety net should be expanded to help all those in our community.  So, in that sense illegal immigration is a problem.   However, the solution is straight forward.  Enforce current laws restricting a business’s ability to hire illegals.  Illegals come here due to the draw of jobs.  Businesses love ‘em because they often work below minimum wage, and don’t complain about things like OSHA requirements.  Fine – regulate the businesses better and we’ll have fewer illegal immigrants.

The End

Trust me, I’m bearded; or

April 22nd, 2010

“I’m the bearded lady! Who are you, one of the freaks?”

This explains everything. Apparently those of us that are blessed with beards are deemed to be more credible than our clean shaven brethren. At least that seems to be the case for neat, medium length beards. And unless we’re trying to sell underwear… go figure.

I wonder if there’s trust-worthiness scale for beard-type? Inter-webz to the rescue:

Please note that my full beard is very trustworthy. My father’s “cop mustache” aka “The Burt Reynolds” is mildly trustworthy. And my brother’s, circa 2000, soul patch put him somewhere between threatening and dangerous.

The End

Damn liberals

March 30th, 2010

and their elitist spelling and grammar

The End

In which my denial of free will receives support from Science [tm]

March 30th, 2010

I’ve long maintained that free will is an illusion.  That the mind arises from the physicality of the brain and that there is no room for an active will separate from the physical processes of the brain.  That’s not to say that I’m a fatalist.  I don’t believe that thoughts are deterministic, let alone subject to perfect prediction.  My position boils down to the brain as a (gigantic) black box containing an uncountable number of states.  Input from the senses changes the state of the brain and occasionally results in actions.

Assuming (which I don’t) that the processes of the brain were completely deterministic, chaos theory tells us that they would not be predictable (what’s the solution to the three-body problem? [other than a king-sized bed]).  Moreover, the processes themselves are dependant on physicalities small enough that quantum effects are relevant and therefore the state changes contain a strong stochastic component.

Philosophically speaking, none of this affects the way we should live.  Education, punishment, personal interaction all affect the state of the brain and are therefore worthwhile [and inform my position that the real purpose of the criminal code should be rehabilitation and not warehousing, punishment or societal retribution... but that's a post for another time].

Yesterday, I heard the coolest story on NPR.  In a nutshell, moral judgements are apparently influenced by the right temporoparietal junction so that a magnetic pulse disrupting that region of the brain affects those judgements.  In the study published in the Proceedings of the National Academy of Science (PNAS), the researchers told one of four stories to the participants.  The stories addressed permutations of effect (neutral or negative) and intent (neutral or negative).  So, one story described an unintentional (neutral intent) poisoning resulting in death (negative effect), another described an intentional (negative intention) failed poisoning (neutral effect), etc.

When making moral judgements, adults generally consider intention.  So if you didn’t intend to poison someone and did, it’s understandable; whereas if you intended to poison and failed, you are still morally culpable.  This is exactly what the researchers found in their controls.  However, after disrupting the right temporoparietal junction, participants started making moral judgements based on the effect.  You intended to kill someone, but failed?  No problem, the person is still alive.  You didn’t mean to kill someone and did?  You bad person, someone died.  Apparently, this is common in children before they learn to make moral judgements based on intention.

So, in a nutshell, temporarily altering the physicality of the brain affects people’s thoughts with respect to moral judgements.  I’ll consider that support for my position that the mind arises from the physicality of brain and any belief you have in a free will separate from those physical processes is an illusion.

The End

Crap… I’m the pig

March 25th, 2010

Old bit of folk wisdom that has served me well over the years:  never wrestle a pig; you both get dirty and the pig enjoys it.  Keeping that in mind has kept me out of all sorts of trouble over the years.  The only problem?  I’m apparently the pig in debates about the new health reform law :-)

The End

Model update (updated!, updated again)

March 22nd, 2010

Earlier, I posted my current model for predicting the NCAA tournament.  Since the whole thing is probabilistic, I figured that I would test it out against the current NCAA standings.  I considered four models:

  1. The one that I described
  2. A random selection of which team would win (50/50 chance)
  3. Always picking the top seeded team
  4. A model suggested by a colleague at work

For each model, I ran 10,000 tests and compared them to the current NCAA tournament results, counting the scores for each test.  Results are:

The X axis is the score (0-64 at this point), the Y axis is the number of test runs (out of 10k) that achieved that score.  The number in the legend is the expected value (score) for each model.  As you can see, my model had the [second] highest expected value.  Choosing the top seeded team was the worst (guaranteed 10 points) best [see the 2nd update].  Choosing randomly was better than selecting the top seed the worst [see update] and my colleague’s model (cyan) was between my model and the random model.  Not bad.  I’ll update after the next two rounds of the tournament.

Update: one interesting thing is that this suggests that there was still a lot of luck in my ESPN pick.  Only about 0.5% of my model runs were as good as that one.

Update 2: So, I’m lying in bed when it occurs to me that I’m an idiot… the team with the *lowest* seed wins a game in Model 3.  This is why I say I don’t really know basketball.

The End

They laughed at my theories!

March 22nd, 2010

They laughed at my theories.  They threw tomatoes when I presented my paper at the academy!  Tomatoes I tell you!  My minions cower in terror, shrinking in fright from the very ideas contained herein!  But I will show them!  I will PROVE IT TO THEM ONCE AND FOR ALL.  The FOOLS, I WILL DESTROY THEM!! MWAHAHAHAAAA! (ask me how)

Oh, sorry.  Where was I?  Apparently, there’s this basketball thing going on.  Some sort of NCAA tournament that will prove who has the best basketball team.  But what if it doesn’t?  What if it’s all just arbitrary?  Could it be that the chances of any team winning a game are not deterministic, but rather stochastic?  I’ll admit that I don’t know that much about basketball.  I mean, I played the sport in junior high.  I do know the rules.  And I even think that it’s a pretty game.  But I don’t follow the ins and outs of a particular season.

So what’s a guy to do when he doesn’t really follow basketball, but you live in NC where bball is life and it’s bracket time?

You model it.   Which is exactly what I did.

The basic model:

  1. Compute a team’s wins minus their losses, I’m sure there’s a word for this, but let’s call it demonstrated strength (D)
  2. For a given match-up, take a draw from a Beta distribution parameterized by each team’s demonstrated strength (D1 and D2)
  3. The resulting draw is the probability that the team representing the first parameter wins
  4. Draw from a uniform random variable to predict if that team actually will win

There are some flaws with the model, the two obvious ones:

  1. Different teams have different schedules, so one team with a 30-5 record might be a lot better than another with a 30-5 record in a different conference (I’m looking at you SEC)
  2. It’s not clear that you should parameterize directly on the demonstrated strengths.  There should probably be a scaling factor in there.  So that rather than drawing from Beta(D1, D2), you should draw from Beta(alpha*D1, alpha*D2)

But this is close enough.  The nice features of the model are:

  1. The expected probability that a team will win is proportional to D1/(D1+D2).  So, a team whose wins outnumber their losses by 10, will have an expected probability of winning of 50% when playing against another team with D2=10.  And only a 33% chance of winning when playing against someone with a D2=20
  2. The closer two teams’ demonstrated strength is to zero, the broader the probability distribution is.  This reflects added uncertainty for two teams who win only slightly more often than they lose.
  3. The larger two team’s demonstrated strength is, the narrower the probability distribution is.  For example, D1=20, D2=40 has the same expected probability as D1=10, D2=20; but because this is a more common pattern for the two teams, we don’t have the same variance.
  4. This is actually pretty rigorous in Bayesian terms.  Throughout the season, we can update the posterior distribution of the probability of winning based on the prior distribution and the most recent game.

So, how well does the model work?  Good question.  I used it on ESPN, and it’s currently ranked in the 92.9th percentile, i.e., better than almost 93% of all ESPN brackets.  All of my final four teams are still alive, and in general, the model predicted several of the biggest upsets in the tournament (e.g., Murray State vs Vanderbilt!).  That said, this is just one random draw from the model.  To test it further, I would like to go through a whole season of games and figure out if the probabilities of winning correspond to the statistics of a Beta distribution for the game’s D1 and D2.  Moreover, I would like to infer the alpha parameter that I mention above.

If the model appears accurate, and we can properly infer alpha, then we get a probabilistic assessment of how feasible it is to even pick tournament champions.  It may just be that at the end of the day, it comes down to luck.

The End

Holy crap… yes we did!

March 22nd, 2010

Almost three months ago to the day, I wrote about the senate health care reform (HCR) bill, how they had achieved cloture and would vote on Christmas eve.  Since December, things haven’t looked all that good for HCR.  A weak candidate in Massachusetts lost to a republican underwear model (not that there’s anything wrong with that), and the Democrats started doing the Democratic thing, which mostly consists of herding all of the cats into a circle and giving them guns to take shots at each other [1].  At one point in late January, the chances of any sort of HCR passing were very close to zero (Intrade was giving it around 22%).

Since then, President Obama has gotten more involved, and Nancy Pelosi (who love her or hate her will go down in history as one of the most effective Speakers of the House in recent memory) started working on her colleagues and the odds went up significantly.  In the past week, it looked almost certain that the House would pass the Senate’s bill and then fix the worst budgetary issues in reconciliation.  It was looking so certain, that the ignorant cretins in the teaparty were out in force, spitting and hurling racist and homophobic comments at legislators.

In spite of all that happened, last night the House did vote to pass the Senate bill.  Then came a Republican motion to recommit the reconciliation bill in an effort to spike the whole thing by driving a wedge between the pro-choice and anti-abortion wedges of the Democratic party.  That failed after Bart Stupak gave an impassioned speech saying that he believed that the current senate language plus the president’s executive order did uphold the Hyde amendment and that the bill was pro-life.  In his words, the bill was pro-life because it not only protected children before they were born, but it helped to ensure that their mothers received pre- and post-natal care, that the children would have insurance and that we know that children and families with insurance are healthier than those without.

Over the past few months, I’ve called Stupak a wanker on more than one occasion, but last night he stepped up and helped to pass health care reform for everyone.  After the vote to recommit, I went to bed (it was after 11pm and I was a bit tired), but the reconciliation bill was voted upon and also passed!

What’s next?  Well, the senate will probably pass the reconciliation bill today.  That will clean up the crap that they had to stick into the bill in order to overcome a republican filibuster.  The President will sign the bill Tuesday.  Then we’ll start seeing some changes.  The bill was begin to close the doughnut hole for drug coverage that the Republicans put into Medicare Part D.  It will begin to limit the insurance companies’ ability to shaft policy holders.  And by 2014, we’ll see the mandate that everyone must have insurance coverage, even if it is subsidized for the poor. Sometime between now and 2014, Democrats will hopefully start to improve the bill.  We still may not get to single payer any time soon, but we might get a public option.

From my standpoint, not too much will change.  I’ll continue to receive insurance through my company.  The congressional budget office (CBO) projects that my company’s costs for insurance will go down about 3%.  Best of all, I stop having to worry about losing insurance if I lose my job or decide to change jobs.  Hell, this even gives me some freedom to consider starting my own business without worrying as much about how to afford health insurance.  All in all, passing HCR was an amazing effort and I’m proud to have watched it happen.



[1] FWIW, this is why I still consider myself to be an Independent, even though I almost always vote Democratic – the Democrats are just too fearful of the political consequences of their own popular platform planks?!  Personally, I prefer a much more muscular liberal set of policies than the Democrats are usually willing to consider… even if they agree that those policies would be better for the country.

The End