In my previous job, I built a big database of zipcodes and geolocations, and distances between those zip codes. The server that this database lived on is getting shut down sometime in the next 24 hours. A couple days ago, I suddenly realized I could use that database to answer a few questions I’ve had about where the A’s should be moving.
So I’ve been scrambling to try to get some queries done, before the server goes away. I managed to get the work done once, but I didn’t get a chance to double-check anything, so take all this with a gigantic grain of “this is a first draft” kind of salt.
* * *
The raw data I had was from the 2000 census and included:
- every 5-digit zip code in the United States (about 40,000 of them)
- the latitude and longitude of each of those zip codes, and
- population and median household income for about 30,000 of those zip codes
I’m not sure why 10,000 zip codes don’t have population and income data. Probably some of them represent entities (like governments and such) that aren’t geographic locations with residents. But not all of them. For example, the zip code that includes Safeco Field in Seattle was among the zip codes missing data. Baltimore looks like it’s missing a big chunk of data. Plus, there’s no Canadian data either, so the Blue Jays are unrepresented, as are probably some additional Tigers and Mariners fans. So I’m sure the data needs a real good scrubbing, so I’ll repeat my warning about the rough nature of this data.
From the geodata, I calculated the distance between any two zipcodes that were less than 150 miles apart.
* * *
If you’re going to build a ballpark somewhere, you’d want to put it somewhere:
- with as many people as possible
- who have as much money as possible
- who live as near to the ballpark as possible
So I came up with a formula to reflect this. For this exercise, I don’t really need to know the exact amount of money a ballpark can generate, I just need a number I can use to compare with. So median household income will do just fine, even though it’s not at all an accurate representation of how much money is available to spend on baseball.
So here’s what I did: For each zip code within 75 miles of a MLB ballpark, I took the population and multiplied it with the median household income of that zipcode, to give that zipcode a total amount of money for that zipcode. (I should probably have divided by average household size, but we’re after relative comparisons here, so it doesn’t matter too much.) Then for each mile that zipcode was from the ballpark, I subtracted 1/75th of that total from the score for that zipcode.
So the closer the zipcode is to the ballpark, the more money from that zipcode is assigned to the team.
Then I repeated the exercise for five potential A’s homes: the Coliseum, Victory Court, Fremont, San Jose, and Sacramento.
Once I had done that, I did it for every minor league park that was more than 75 miles from any existing MLB park, plus Portland, Honolulu, and Anchorage.
* * *
The (rough) results, for your viewing pleasure:
|zip||team||city||state||relative market size|
|90012||Dodgers||Los Angeles||CA||$ 510,586,706,490|
|60616||White Sox||Chicago||IL||$ 353,523,094,940|
|94107||Giants||San Francisco||CA||$ 276,531,798,517|
|02215||Red Sox||Boston||MA||$ 258,052,953,191|
|92101||Padres||San Diego||CA||$ 119,511,331,778|
|63102||Cardinals||St. Louis||MO||$ 95,088,871,967|
|33705||Rays||St. Petersburg||FL||$ 86,162,065,166|
|64129||Royals||Kansas City||MO||$ 75,617,221,577|
|94607||Victory Ct||Oakland||CA||$ 288,464,089,740|
|95110||San Jose||San Jose||CA||$ 244,281,690,385|
|95691||Sacramento||West Sacramento||CA||$ 122,189,968,456|
|29715||Charlotte||Fort Mill||SC||$ 66,474,729,100|
|78664||Round Rock||Round Rock||TX||$ 57,730,378,193|
|78227||San Antonio||San Antonio||TX||$ 57,705,434,009|
|49017||Southwest MI||Battle Creek||MI||$ 54,991,532,182|
|27105||Winston-Salem||Winston Salem||NC||$ 54,919,270,523|
|89101||Las Vegas||Las Vegas||NV||$ 53,809,399,974|
|49321||West Michigan||Comstock Park||MI||$ 53,532,262,868|
|70003||New Orleans||Metairie||LA||$ 51,061,101,559|
|32114||Daytona||Daytona Beach||FL||$ 48,253,823,630|
|32940||Brevard County||Melbourne||FL||$ 47,518,011,693|
|91730||Rancho Cucamonga||Rancho Cucamonga||CA||$ 39,638,600,052|
* * *
These results, outside of Baltimore, smell more or less right to me.
Next, though, I tried to put some measure on what happens to a market when it is shared between teams. This is where the results surprised me, enough so, that I think I probably screwed up somewhere.
I’ll address that in an upcoming blog post.