Snooth Blog

Snooth User: Philip James

Mired in data

Posted by Philip James, May 17, 2007.

As we get closer to launch I'm excited to be able to talk more about the site.

We spent a lot of time these last few weeks cleaning up the data. We'd passed 350,000 wines and 1.2 million ratings and we had a backlog of retailers' feeds to integrate. It was clear we needed to step back and take the time to consolidate some of these records.

It has taken us several weeks and hundreds of hours of tweaking and testing, but we finally have an algorithm that we're proud of. We ran it yesterday and it caught and merged around 40,000 wines. We'll continue to refine it over time, but the bulk of the work has been done.

What this means is that if one retailer calls a wine "Bodega Benegas 2005 Chardonnay Reserve" and another retailer calls it "05 Benegas Chard Reserva" you wont see both records next to each other when you search. They are the same wine and there should just be one record. If you click to buy it, you'll then be given the choice of retailers. This allows us to consolidate reviews and to provide a better SnoothRank as we continue to gather more information about each wine.

Of course, any of our Beta testers can probably still find numerous examples of wine's that should be merged, but that we've missed. We're still refining the algorithm, but in the meantime email me and I'll fix it.

Replies


Back to Categories

Top Contributors This Month

125836 Snooth User: dmcker
125836dmcker
75 posts
324443 Snooth User: outthere
324443outthere
69 posts
847804 Snooth User: EMark
847804EMark
62 posts

Categories

View All





Snooth Media Network