Snooth Blog

Snooth User: Philip James

Mired in data

Posted by Philip James, May 17, 2007.

As we get closer to launch I'm excited to be able to talk more about the site.

We spent a lot of time these last few weeks cleaning up the data. We'd passed 350,000 wines and 1.2 million ratings and we had a backlog of retailers' feeds to integrate. It was clear we needed to step back and take the time to consolidate some of these records.

It has taken us several weeks and hundreds of hours of tweaking and testing, but we finally have an algorithm that we're proud of. We ran it yesterday and it caught and merged around 40,000 wines. We'll continue to refine it over time, but the bulk of the work has been done.

What this means is that if one retailer calls a wine "Bodega Benegas 2005 Chardonnay Reserve" and another retailer calls it "05 Benegas Chard Reserva" you wont see both records next to each other when you search. They are the same wine and there should just be one record. If you click to buy it, you'll then be given the choice of retailers. This allows us to consolidate reviews and to provide a better SnoothRank as we continue to gather more information about each wine.

Of course, any of our Beta testers can probably still find numerous examples of wine's that should be merged, but that we've missed. We're still refining the algorithm, but in the meantime email me and I'll fix it.


Back to Categories

Popular Topics

Top Contributors This Month

259386 Snooth User: zufrieden
27 posts
1413489 Snooth User: dvogler
18 posts
357808 Snooth User: vin0vin0
5 posts


View All

Snooth Media Network