<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: BookMooch will be a bit slow</title>
	<atom:link href="http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/</link>
	<description>Give books away, get books you want.</description>
	<lastBuildDate>Fri, 20 Nov 2009 14:19:39 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: John Buckman</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10854</link>
		<dc:creator>John Buckman</dc:creator>
		<pubDate>Wed, 11 Jun 2008 20:42:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10854</guid>
		<description>David wrote:
&lt;i&gt;how much would adding using LIMIT to cut down the results returned improve the situation&lt;/i&gt;

Unfortunately, I need to get all the results, to find out which books are available, as the default search results page is to show only available books. Since the search index has ALL books, not just available ones, I need to get all the results anyway.

At any rate, this problem seems to be solved for now, as a 1GB in-memory cache seems to have had a great performance impact, and I&#039;ve been running that for a few days, as the outcome of my various techniques found that to work best.</description>
		<content:encoded><![CDATA[<p>David wrote:<br />
<i>how much would adding using LIMIT to cut down the results returned improve the situation</i></p>
<p>Unfortunately, I need to get all the results, to find out which books are available, as the default search results page is to show only available books. Since the search index has ALL books, not just available ones, I need to get all the results anyway.</p>
<p>At any rate, this problem seems to be solved for now, as a 1GB in-memory cache seems to have had a great performance impact, and I&#8217;ve been running that for a few days, as the outcome of my various techniques found that to work best.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David A. Spitzley</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10847</link>
		<dc:creator>David A. Spitzley</dc:creator>
		<pubDate>Wed, 11 Jun 2008 19:02:59 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10847</guid>
		<description>I actually think that the &quot;does it really need sorting?&quot; question points in an interesting direction:  how much would adding using LIMIT to cut down the results returned improve the situation?  I suppose a corollary to this is how much of the server resources are being used for queries which are too large to be digestible by a human?</description>
		<content:encoded><![CDATA[<p>I actually think that the &#8220;does it really need sorting?&#8221; question points in an interesting direction:  how much would adding using LIMIT to cut down the results returned improve the situation?  I suppose a corollary to this is how much of the server resources are being used for queries which are too large to be digestible by a human?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10805</link>
		<dc:creator>Neil</dc:creator>
		<pubDate>Wed, 11 Jun 2008 04:04:43 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10805</guid>
		<description>It would seem to me this is sort of a &#039;solved problem&#039; in that various places do this type of sorting (libraries, amazon,etc). Could you not offload it to Amazon somehow? Perhaps use their cloud computing service?</description>
		<content:encoded><![CDATA[<p>It would seem to me this is sort of a &#8217;solved problem&#8217; in that various places do this type of sorting (libraries, amazon,etc). Could you not offload it to Amazon somehow? Perhaps use their cloud computing service?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Buckman</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10800</link>
		<dc:creator>John Buckman</dc:creator>
		<pubDate>Tue, 03 Jun 2008 20:52:05 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10800</guid>
		<description>re: &quot;does it really need sorting?&quot;

You make some very good points.

It&#039;s true that nobody wants to look through 500,000 books with the word &quot;science&quot; in their title or topic, as a result of searching for &quot;science&quot;.

If there are only 20 books returned to a search, sorted results are nice.

However, I think that search results might be more interesting sorted by &quot;relevancy&quot;, which probably means moving the most-popular books to the top of the list, either by using the &quot;number of times mooched&quot; or &quot;amazon sales ranking&quot;.  

This kind of sort would probably be more helpful for any list of books that goes on for more than 2 pages.</description>
		<content:encoded><![CDATA[<p>re: &#8220;does it really need sorting?&#8221;</p>
<p>You make some very good points.</p>
<p>It&#8217;s true that nobody wants to look through 500,000 books with the word &#8220;science&#8221; in their title or topic, as a result of searching for &#8220;science&#8221;.</p>
<p>If there are only 20 books returned to a search, sorted results are nice.</p>
<p>However, I think that search results might be more interesting sorted by &#8220;relevancy&#8221;, which probably means moving the most-popular books to the top of the list, either by using the &#8220;number of times mooched&#8221; or &#8220;amazon sales ranking&#8221;.  </p>
<p>This kind of sort would probably be more helpful for any list of books that goes on for more than 2 pages.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kirsty</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10799</link>
		<dc:creator>kirsty</dc:creator>
		<pubDate>Tue, 03 Jun 2008 20:45:46 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10799</guid>
		<description>You can tell the sort of programmer I am because my first thought on your sorting problem was: does it really need sorting?  

For inventories and small lists, yes, I want to see an author&#039;s titles grouped together. If someone searches for a general term though are they really after an alphabeticalised list?

Alphabetical order is useful for finding a specific thing in a list, but if you want to find a specific thing among that many items then you can search for it using a specific term.  A list of half a million items isn&#039;t what a human wants to read even if it&#039;s in order.</description>
		<content:encoded><![CDATA[<p>You can tell the sort of programmer I am because my first thought on your sorting problem was: does it really need sorting?  </p>
<p>For inventories and small lists, yes, I want to see an author&#8217;s titles grouped together. If someone searches for a general term though are they really after an alphabeticalised list?</p>
<p>Alphabetical order is useful for finding a specific thing in a list, but if you want to find a specific thing among that many items then you can search for it using a specific term.  A list of half a million items isn&#8217;t what a human wants to read even if it&#8217;s in order.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Valerie in San Diego</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10798</link>
		<dc:creator>Valerie in San Diego</dc:creator>
		<pubDate>Tue, 03 Jun 2008 19:36:34 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10798</guid>
		<description>I really love that you&#039;re sharing this technical info. Obviously it provides knowledgeable people the chance to chip in and try to help, but for a moderately technical person like myself, it&#039;s a great learning experience, and, of course, it promotes transparency, which should reduce complaints.</description>
		<content:encoded><![CDATA[<p>I really love that you&#8217;re sharing this technical info. Obviously it provides knowledgeable people the chance to chip in and try to help, but for a moderately technical person like myself, it&#8217;s a great learning experience, and, of course, it promotes transparency, which should reduce complaints.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Buckman</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10796</link>
		<dc:creator>John Buckman</dc:creator>
		<pubDate>Tue, 03 Jun 2008 13:25:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10796</guid>
		<description>re: sharding &amp; partitioning

Yes, I know that I will eventually need to use those techniques, and the bookmooch architecture already supports it.

If you want to read about BerkeleyDB&#039;s replication technology, take a look at http://www.oracle.com/technology/products/berkeley-db/feature-sets.html -- this is replicated one-writer, multiple-reader, and will scale well as long as data-writing is kept under control.  What I&#039;m currently experimenting with, is what the effects of not using disk-based caching are to bookmooch performance, since writes are both slow, but also perform poorly in a replicated environment.

This replicated BerkeleyDB is currently in use by Google for their &quot;single sign-on&quot; feature.  

I do know all about memcached, and its persistent variant, called memcachedb, actually uses BerkeleyDB as its engine http://memcachedb.org/

So do not worry, BookMooch can scale to very great heights. The issue for me now is to get the most I can out of one machine, so that costs will be minimized when I need to go to a multi-machine setup.</description>
		<content:encoded><![CDATA[<p>re: sharding &amp; partitioning</p>
<p>Yes, I know that I will eventually need to use those techniques, and the bookmooch architecture already supports it.</p>
<p>If you want to read about BerkeleyDB&#8217;s replication technology, take a look at <a href="http://www.oracle.com/technology/products/berkeley-db/feature-sets.html" rel="nofollow">http://www.oracle.com/technology/products/berkeley-db/feature-sets.html</a> &#8212; this is replicated one-writer, multiple-reader, and will scale well as long as data-writing is kept under control.  What I&#8217;m currently experimenting with, is what the effects of not using disk-based caching are to bookmooch performance, since writes are both slow, but also perform poorly in a replicated environment.</p>
<p>This replicated BerkeleyDB is currently in use by Google for their &#8220;single sign-on&#8221; feature.  </p>
<p>I do know all about memcached, and its persistent variant, called memcachedb, actually uses BerkeleyDB as its engine <a href="http://memcachedb.org/" rel="nofollow">http://memcachedb.org/</a></p>
<p>So do not worry, BookMooch can scale to very great heights. The issue for me now is to get the most I can out of one machine, so that costs will be minimized when I need to go to a multi-machine setup.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10795</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Tue, 03 Jun 2008 13:14:07 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10795</guid>
		<description>OK, I love that you&#039;re sharing this info.  I have some advice, which you probably already know, but I&#039;m going to share anyway.  

As BookMooch grows, adding more RAM and switching to Solid State Drives are really just temporary solutions that won&#039;t scale very well, or for very long.  If you put these in place, before long you&#039;ll be right back in the same position, but eventually you won&#039;t be able to add more RAM or more SSDs.  You need a solid architecture redesign.  

You&#039;re getting into the area of needing to think about database sharding and whatnot.  Also, memcached is a wonderful tool to aid in caching queries and dynamic pages.  There are probably hundreds of areas where a little bit of cache will help speed things up, but as far as the database goes, you&#039;re definitely going to need to start considering some type of partitioning and clustering.</description>
		<content:encoded><![CDATA[<p>OK, I love that you&#8217;re sharing this info.  I have some advice, which you probably already know, but I&#8217;m going to share anyway.  </p>
<p>As BookMooch grows, adding more RAM and switching to Solid State Drives are really just temporary solutions that won&#8217;t scale very well, or for very long.  If you put these in place, before long you&#8217;ll be right back in the same position, but eventually you won&#8217;t be able to add more RAM or more SSDs.  You need a solid architecture redesign.  </p>
<p>You&#8217;re getting into the area of needing to think about database sharding and whatnot.  Also, memcached is a wonderful tool to aid in caching queries and dynamic pages.  There are probably hundreds of areas where a little bit of cache will help speed things up, but as far as the database goes, you&#8217;re definitely going to need to start considering some type of partitioning and clustering.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Buckman</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10794</link>
		<dc:creator>John Buckman</dc:creator>
		<pubDate>Tue, 03 Jun 2008 13:03:24 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10794</guid>
		<description>re: RAID5

I&#039;ve used RAID in the past, and always have had data loss issues, as have other sysadmins I know. Google doesn&#039;t use RAID, and neither does the Internet Archive, and I know the Internet Archive tried, but had RAID reliability issues as well.

Also, RAID5 is faster, but not 300x faster, like a solid-state-drive vs a traditional fixed-disk drive.</description>
		<content:encoded><![CDATA[<p>re: RAID5</p>
<p>I&#8217;ve used RAID in the past, and always have had data loss issues, as have other sysadmins I know. Google doesn&#8217;t use RAID, and neither does the Internet Archive, and I know the Internet Archive tried, but had RAID reliability issues as well.</p>
<p>Also, RAID5 is faster, but not 300x faster, like a solid-state-drive vs a traditional fixed-disk drive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Graham (grahamt)</title>
		<link>http://blog.bookmooch.com/2008/06/02/bookmooch-will-be-a-bit-slow/#comment-10793</link>
		<dc:creator>Graham (grahamt)</dc:creator>
		<pubDate>Tue, 03 Jun 2008 12:34:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bookmooch.com/?p=678#comment-10793</guid>
		<description>John, maybe you&#039;ve already thought about this but have you considered a RAID Disk Array.  Spreading the data across multiple disks would involve a greater hardware cost, depending upon how many disks you used, although they&#039;re pretty cheap relatively speaking these days.  Using RAID5 for instance, would give not just improved data throughput performance but greater data security as well.  With RAID5 you can lose a whole disk and all the data will still be accessible. There&#039;s a useful article on RAID on Wikipedia if you want more information.  What database are you using?</description>
		<content:encoded><![CDATA[<p>John, maybe you&#8217;ve already thought about this but have you considered a RAID Disk Array.  Spreading the data across multiple disks would involve a greater hardware cost, depending upon how many disks you used, although they&#8217;re pretty cheap relatively speaking these days.  Using RAID5 for instance, would give not just improved data throughput performance but greater data security as well.  With RAID5 you can lose a whole disk and all the data will still be accessible. There&#8217;s a useful article on RAID on Wikipedia if you want more information.  What database are you using?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
