<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Tek-Dev</title>
	<atom:link href="http://www.tek-dev.com/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.tek-dev.com/blog</link>
	<description>Software and web development blog ...</description>
	<pubDate>Wed, 01 Apr 2009 20:41:48 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Directory of IT services for Ireland &amp; UK</title>
		<link>http://www.tek-dev.com/blog/post/directory-of-it-services-in-ireland-uk-for-smes.aspx</link>
		<comments>http://www.tek-dev.com/blog/post/directory-of-it-services-in-ireland-uk-for-smes.aspx#comments</comments>
		<pubDate>Wed, 01 Apr 2009 20:24:19 +0000</pubDate>
		<dc:creator>steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tek-dev.com/blog/?p=20</guid>
		<description><![CDATA[I am currently putting together a directory of IT services at http://www.tek-dev.com/dir/ for use by companies in Ireland and the UK. I am hoping to keep the directory concise while at the same time including as many quality services as possible. 
If you have are Irish or UK based IT company that does not specialise [...]]]></description>
			<content:encoded><![CDATA[<p>I am currently putting together a directory of IT services at http://www.tek-dev.com/dir/ for use by companies in Ireland and the UK. I am hoping to keep the directory concise while at the same time including as many quality services as possible. </p>
<p>If you have are Irish or UK based IT company that does not specialise in software development (conflict of interest!) please submit you link and short description to the <a href="http://www.tek-dev.com/dir">directory</a> for inclusion using the submit link,</p>
<p>If you cannot find a suitable category, leave a comment here and i will add as appropriate.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tek-dev.com/blog/post/directory-of-it-services-in-ireland-uk-for-smes.aspx/feed</wfw:commentRss>
		</item>
		<item>
		<title>C Open Source Mail and Calendaring System</title>
		<link>http://www.tek-dev.com/blog/post/C-Open-Source-Mail-And-Calendaring-System.aspx</link>
		<comments>http://www.tek-dev.com/blog/post/C-Open-Source-Mail-And-Calendaring-System.aspx#comments</comments>
		<pubDate>Mon, 23 Mar 2009 18:00:32 +0000</pubDate>
		<dc:creator>steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tek-dev.com/blog/?p=18</guid>
		<description><![CDATA[
&#160;I have spent the last few days looking at the zimbra&#160;open source mail and calendaring system. This project is hosted by Yahoo and upon further examination seems to be only semi-open source. Let me explain.. although the source code is available and the project seems to be seeking collaboration from the wider community there are [...]]]></description>
			<content:encoded><![CDATA[<p>
&nbsp;I have spent the last few days looking at the <a href="http://www.zimbra.com" title="zimbra">zimbra</a>&nbsp;open source mail and calendaring system. This project is hosted by Yahoo and upon further examination seems to be only semi-open source. Let me explain.. although the source code is available and the project seems to be seeking collaboration from the wider community there are several problems. The first being code availabiltiy, the zimbra project has multiple versions, such as subscription based supported versions all the ways down to an open source version. The main issue here is that the open source version is stripped of some of the nice features that come with the subscription version which seems to go against the ethos of open source. The second problem that i have with the project is the&nbsp;way in which contributers form outside zimbra/yahoo can contribute code changes/bug fixes. If you are not a zimbra employee you cannot make changes to the code repositories directly, you have to send your contributions along with a signed contract to zimbra, where a decision will be made on whether or not you contributions&nbsp;will be accepted.
</p>
<p>
This to me seems like an odd way to do business, and coders that make valuable contributions to the project are still only entitled to have a copy of the cut down &quot;open source&quot; version, not the feature rich one that subscribers have to pay for. At this rate i am surprised there are any contributers at all !!. One would be forgiven&nbsp;for speculating that zimbra have fulfilled the bare minumum necessary to brand their product open source in order to benefit form a wide variety of open source tools such as <a href="http://lucene.apache.org/">lucene</a>, <a href="http://www.postfix.org/">post fix</a>, <a href="http://www.openldap.org/" title="OpenLDAP">open ldap</a>, <a href="http://www.mysql.com/">mysql</a> etc ..
</p>
<p>
Aside from the particular model the project uses to operate, the software is very very good and the only serious open source&nbsp;compeditor&nbsp;to the Exchange servers of this world ! What i most like about the zimbra collabouration suite is its web client. The interface is constructed from almost 100% java script, which is contained within a number of java jsp pages.. What struck me about this architecture is that if most of the nuts and bolts are javascript, then how difficult would it be to launch this type of interface from a few .aspx dot net pages.
</p>
<p>
&nbsp;What i am proposing to do is to launch a new c# open source mail/calendar web based application using this type of technology. The project would be much more simple and would contain the following componetns:
</p>
<p>
&nbsp;A free database with an ADO.NET client such as Firebird.
</p>
<p>
A server componant which contains all of the data acess and business logic
</p>
<p>
An open source SMTP mailer such as <a href="http://www.ericdaugherty.com/dev/cses/">CSES</a>&nbsp;which will be used by the server component.
</p>
<p>
An Asp.Net front end which makes use of the zimbra javascript canendar and mail client objects
</p>
<p>
Posibly a desktop client which provides similar functionality to the web client.
</p>
<p>
As a first pass i don&#39;t intend to integrate security with the active directory, an simple froms based security model could be implemented in the server to authenticate potiental users of the system.
</p>
<p>
I have already been playing around with a coupe of c# prototypes, to see what sort of architecture the sytem should take on, but what i will really need help is on porting the javascript client from zimbra&#39;s java platfrom to .Net. I am looking for as many interested .Net/Java/Javascript developers as possible to help me get this think off the ground, so if you are intereste please leave a comment at the end of this entry and i will be in touch,
</p>
<p>
&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tek-dev.com/blog/post/C-Open-Source-Mail-And-Calendaring-System.aspx/feed</wfw:commentRss>
		</item>
		<item>
		<title>When to use rentacoder or elance for outsourcing a software project</title>
		<link>http://www.tek-dev.com/blog/post/When-to-use-rentacoder-or-elance-for-outsourcing-a-project.aspx</link>
		<comments>http://www.tek-dev.com/blog/post/When-to-use-rentacoder-or-elance-for-outsourcing-a-project.aspx#comments</comments>
		<pubDate>Sat, 21 Mar 2009 00:22:14 +0000</pubDate>
		<dc:creator>steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tek-dev.com/blog/?p=13</guid>
		<description><![CDATA[
A lot of people have at this point have heard about the rentacoder.com and elance.com type websites. These are online marketplaces where software buyers can pay software coders to create desktop and web applications using an auction system similar to ebay. A software buyer places his requirements on to the site where coders review the [...]]]></description>
			<content:encoded><![CDATA[<p>
A lot of people have at this point have heard about the rentacoder.com and elance.com type websites. These are online marketplaces where software buyers can pay software coders to create desktop and web applications using an auction system similar to ebay. A software buyer places his requirements on to the site where coders review the requirements and make bids on the work depending on how much they are willing to complete the work for. At a glance this seems like a good service, where buyers can get good value from the competeing coders while freelancers get a steady stream of work, but is it ?
</p>
<p>
I would say there are yes and no answers to that question depending project type and size..
</p>
<p>
The first important point to note about these services are the profile of programmers and freelancers that registered on the site. Because the sites have an international audience projects will tend to get lower bids from developing and countries. This means that projects generally get awarded to the lower bids, and it is through this system of jobs generally going to emerging countries that coders from developed countries do not participate in the bid auctions. At these sites there will typically be a lot of bids from countries such as India, Romania, Pakistan and Sri Lanka occasionally there will be bids from developed countries such as the United States or Ireland. This does not mean however that the work will be sub standard and cam be quite the contrary in fact with countries like India having some of the best software engineering schools in the world.
</p>
<p>
One of the main consequences of using offshore development teams that are not native English speakers is communication, and even when English is the native language, dialects can vary a lot depending on location. A lot of the time this obstacle can be overcome by using written forms of communication such as email and messenger, however this becomes impractical on larger projects. One of the more important aspects of the written forms of communication used in these services is that they can be used by as evidence by site mediators should the project go into arbitration. This is however dependant on the site approved messaging forums being used as opposed to external email/messenger.
</p>
<p>
&nbsp;Another important point to note about the coders on this type of service is that they are often individuals with other full time jobs which gives them limited bandwidth for their freelance work, having said this a lot of the bidders are companies with multiple coders, (or at least claim to have multiple coders!. The elance service seems to have many more companies make bids than individuals making bids. I suspect this bias is brought about by the expensive subscription fees charged by elance, companies with a high turnover of work/revenue seem to favor this site as the fees are a flat monthly rate compared to a % of the profit, which is the model employed by rentacoder.
</p>
<p>
&nbsp;A point to note when placing a job on one of these services is the level of technical expertise held by the poster. If the person posting the job does not at least have some grounding in software development/engineering, then this can lead to problems. One of the main complaints from coders on the site are that software buyers provide ambiguous or unrealistic requirements. This can lead to the job being ignored by coders or worse still the job is won by a coder who fails to deliver due to the non-specific requirements. If the job goes into arbitration before the job is complete then the buyer risks losing money due to the software vaguely meeting the buyers written requirements, but not delivering on what the buyer actually wants.
</p>
<p>
<br />
So in summary this can be a good service which provides low cost software, however there is still some risk a s discussed above. One of the main deciders of whether or not to use one of these services is the size and complexity of the projects, if it is a large and complex project i would say hire a local developer/team where you can at least have some face to face meetings. Similarly if you do not have knowledge of basic software development processes then i would hire a systems analyst or similar professional to at last help you with the requirements. But if you have a smaller project where you know what you want and failure of the project will not cause you/your company with substantial risk, then I would say go for it <img src='http://www.tek-dev.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<p>
So whatever you decide, good luck with your software endeavors !!&nbsp;
</p>
<p>
&nbsp;
</p>
<p>
<script type="text/javascript">
digg_url = 'http://www.tek-dev.com/blog/post/When-to-use-rentacoder-or-elance-for-outsourcing-a-project-.aspx';</script><br />
<script src="http://digg.com/tools/diggthis.js" type="text/javascript">
</script><a rel="me" href="http://technorati.com/claim/gb8gcu8ptu">Technorati Profile</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.tek-dev.com/blog/post/When-to-use-rentacoder-or-elance-for-outsourcing-a-project.aspx/feed</wfw:commentRss>
		</item>
		<item>
		<title>Open Source Enterprise Search Review Part Four</title>
		<link>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Four.aspx</link>
		<comments>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Four.aspx#comments</comments>
		<pubDate>Fri, 20 Mar 2009 23:57:04 +0000</pubDate>
		<dc:creator>steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tek-dev.com/blog/?p=11</guid>
		<description><![CDATA[
In the final part of this series we will look at a very simple webpage that takes a search term, then post the term to a search page that executes the query for us and displays the results on the webpage. The lucene object that reads the index is called a IndexReader. Before the query [...]]]></description>
			<content:encoded><![CDATA[<p>
In the final part of this series we will look at a very simple webpage that takes a search term, then post the term to a search page that executes the query for us and displays the results on the webpage. The lucene object that reads the index is called a IndexReader. Before the query is processed it needds to be precessed by our friend the analyser. The analyser will remove exotic characters, plurals etc. ect. so clean terms are tokenised and used to query the index. It is important to use the same analyser to read the index as we used to write to the index. We used the Standard anlyser to process the terms that were written to the index, so we will use it again to proces the search terms. The following is the simple jsp page used to take the search term and post it to the results page.
</p>
<p>
&nbsp;&nbsp;<span class="cch1"><font color="gray"></font></span><font color="blue">&lt;</font><font color="maroon">html</font><font color="blue">&gt;</font>
</p>
<pre>
<font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">head</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">meta</font><font color="red">&nbsp;http-equiv</font><font color="blue">=&quot;Content-Type&quot;</font><font color="red">&nbsp;content</font><font color="blue">=&quot;text/html;&nbsp;charset=UTF-8&quot;&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">title</font><font color="blue">&gt;</font><font color="black">JSP&nbsp;Search&nbsp;Page</font><font color="blue">&lt;/</font><font color="maroon">title</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;/</font><font color="maroon">head</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">body</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">form</font><font color="red">&nbsp;name</font><font color="blue">=&quot;search&quot;</font><font color="red">&nbsp;action</font><font color="blue">=&quot;results.jsp&quot;</font><font color="red">&nbsp;method</font><font color="blue">=&quot;get&quot;&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">p</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">input</font><font color="red">&nbsp;name</font><font color="blue">=&quot;query&quot;</font><font color="red">&nbsp;size</font><font color="blue">=&quot;44&quot;/&gt;&nbsp;Search&nbsp;Criteria
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;/</font><font color="maroon">p</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">p</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;</font><font color="maroon">input</font><font color="red">&nbsp;name</font><font color="blue">=&quot;maxresults&quot;</font><font color="red">&nbsp;size</font><font color="blue">=&quot;4&quot;</font><font color="red">&nbsp;value</font><font color="blue">=&quot;100&quot;/&gt;&nbsp;Results&nbsp;Per&nbsp;Page&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;</font><font color="maroon">input</font><font color="red">&nbsp;type</font><font color="blue">=&quot;submit&quot;</font><font color="red">&nbsp;value</font><font color="blue">=&quot;Search&quot;/&gt;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;/</font><font color="maroon">p</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;/</font><font color="maroon">form</font><font color="blue">&gt;</font><font color="black">
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">&lt;/</font><font color="maroon">body</font><font color="blue">&gt;</font><font color="black">
</font><font color="blue">&lt;/</font><font color="maroon">html</font><font color="blue">&gt;</font>
</pre>
<p>
The following is the results page that takes the search parametrs and displays the results. Most of the code is concerned with paging the results, the piece of interest is the index reader that takes the search term, processes it with the standard analyser and outputs the results. this results page is taken directly from the Lucene documentation with a few modifications:
</p>
<p>
<font color="black">&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">/*<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Author:&nbsp;Andrew&nbsp;C.&nbsp;Oliver,&nbsp;SuperLink&nbsp;Software,&nbsp;Inc.&nbsp;(acoliver2@users.sourceforge.net)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;This&nbsp;jsp&nbsp;page&nbsp;is&nbsp;deliberatly&nbsp;written&nbsp;in&nbsp;the&nbsp;horrible&nbsp;java&nbsp;directly&nbsp;embedded&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;the&nbsp;page&nbsp;style&nbsp;for&nbsp;an&nbsp;easy&nbsp;and&nbsp;concise&nbsp;demonstration&nbsp;of&nbsp;Lucene.<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Due&nbsp;note&#8230;if&nbsp;you&nbsp;write&nbsp;pages&nbsp;that&nbsp;look&nbsp;like&nbsp;this&#8230;sooner&nbsp;or&nbsp;later<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;you&#39;ll&nbsp;have&nbsp;a&nbsp;maintenance&nbsp;nightmare.&nbsp;&nbsp;If&nbsp;you&nbsp;use&nbsp;jsps&#8230;use&nbsp;taglibs<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and&nbsp;beans!&nbsp;&nbsp;That&nbsp;being&nbsp;said,&nbsp;this&nbsp;should&nbsp;be&nbsp;acceptable&nbsp;for&nbsp;a&nbsp;small<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;page&nbsp;demonstrating&nbsp;how&nbsp;one&nbsp;uses&nbsp;Lucene&nbsp;in&nbsp;a&nbsp;web&nbsp;app.&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;This&nbsp;is&nbsp;also&nbsp;deliberately&nbsp;overcommented.&nbsp;;-)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*/<br />
</font><font color="black">%&gt;<br />
&lt;%!<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">public&nbsp;</font><font color="black">String&nbsp;escapeHTML(String&nbsp;s)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">s.replaceAll(</font><font color="#808080">&quot;&amp;&quot;</font><font color="black">,&nbsp;</font><font color="#808080">&quot;&amp;&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">s&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">s.replaceAll(</font><font color="#808080">&quot;&lt;&quot;</font><font color="black">,&nbsp;</font><font color="#808080">&quot;&lt;&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">s&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">s.replaceAll(</font><font color="#808080">&quot;&gt;&quot;</font><font color="black">,&nbsp;</font><font color="#808080">&quot;&gt;&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">s&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">s.replaceAll(</font><font color="#808080">&quot;\&quot;&quot;</font><font color="black">,&nbsp;</font><font color="#808080">&quot;&quot;&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">s&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">s.replaceAll(</font><font color="#808080">&quot;&#39;&quot;</font><font color="black">,&nbsp;</font><font color="#808080">&quot;&#39;&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;</font><font color="black">s</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
%&gt;</p>
<p>&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">boolean&nbsp;</font><font color="black">error&nbsp;</font><font color="blue">=&nbsp;false;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//used&nbsp;to&nbsp;control&nbsp;flow&nbsp;for&nbsp;error&nbsp;messages<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;indexName&nbsp;</font><font color="blue">=&nbsp;</font><font color="#808080">&quot;/opt/lucene/index&quot;</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//local&nbsp;copy&nbsp;of&nbsp;the&nbsp;configuration&nbsp;variable<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">IndexSearcher&nbsp;searcher&nbsp;</font><font color="blue">=&nbsp;null;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//the&nbsp;searcher&nbsp;used&nbsp;to&nbsp;open/search&nbsp;the&nbsp;index<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">Query&nbsp;query&nbsp;</font><font color="blue">=&nbsp;null;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//the&nbsp;Query&nbsp;created&nbsp;by&nbsp;the&nbsp;QueryParser<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">Hits&nbsp;hits&nbsp;</font><font color="blue">=&nbsp;null;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//the&nbsp;search&nbsp;results<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">int&nbsp;</font><font color="black">startindex&nbsp;</font><font color="blue">=&nbsp;</font><font color="maroon">0</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//the&nbsp;first&nbsp;index&nbsp;displayed&nbsp;on&nbsp;this&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">int&nbsp;</font><font color="black">maxpage&nbsp;</font><font color="blue">=&nbsp;</font><font color="maroon">50</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//the&nbsp;maximum&nbsp;items&nbsp;displayed&nbsp;on&nbsp;this&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;queryString&nbsp;</font><font color="blue">=&nbsp;null;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//the&nbsp;query&nbsp;entered&nbsp;in&nbsp;the&nbsp;previous&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;startVal&nbsp;</font><font color="blue">=&nbsp;null;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//string&nbsp;version&nbsp;of&nbsp;startindex<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;maxresults&nbsp;</font><font color="blue">=&nbsp;null;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//string&nbsp;version&nbsp;of&nbsp;maxpage<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">int&nbsp;</font><font color="black">thispage&nbsp;</font><font color="blue">=&nbsp;</font><font color="maroon">0</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//used&nbsp;for&nbsp;the&nbsp;for/next&nbsp;either&nbsp;maxpage&nbsp;or<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//hits.length()&nbsp;-&nbsp;startindex&nbsp;-&nbsp;whichever&nbsp;is<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//less</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">try&nbsp;</font><font color="black">{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;searcher&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">IndexSearcher(indexName)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//create&nbsp;an&nbsp;indexSearcher&nbsp;for&nbsp;our&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//NOTE:&nbsp;this&nbsp;operation&nbsp;is&nbsp;slow&nbsp;for&nbsp;large<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//indices&nbsp;(much&nbsp;slower&nbsp;than&nbsp;the&nbsp;search&nbsp;itself)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//so&nbsp;you&nbsp;might&nbsp;want&nbsp;to&nbsp;keep&nbsp;an&nbsp;IndexSearcher&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//open</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}&nbsp;</font><font color="blue">catch&nbsp;</font><font color="black">(Exception&nbsp;e)&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//any&nbsp;error&nbsp;that&nbsp;happens&nbsp;is&nbsp;probably&nbsp;due<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//to&nbsp;a&nbsp;permission&nbsp;problem&nbsp;or&nbsp;non-existant<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//or&nbsp;otherwise&nbsp;corrupt&nbsp;index<br />
</font><font color="black">%&gt;<br />
&lt;p&gt;ERROR&nbsp;opening&nbsp;the&nbsp;Index&nbsp;-&nbsp;contact&nbsp;sysadmin!&lt;/p&gt;<br />
&lt;p&gt;Error&nbsp;message:&nbsp;&lt;%</font><font color="blue">=</font><font color="black">escapeHTML(e.getMessage())%&gt;&lt;/p&gt;&nbsp;&nbsp;&nbsp;<br />
&lt;%&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;error&nbsp;</font><font color="blue">=&nbsp;true;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//don&#39;t&nbsp;do&nbsp;anything&nbsp;up&nbsp;to&nbsp;the&nbsp;footer<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
%&gt;<br />
&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(error&nbsp;</font><font color="blue">==&nbsp;false</font><font color="black">)&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//did&nbsp;we&nbsp;open&nbsp;the&nbsp;index?<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">queryString&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">request.getParameter(</font><font color="#808080">&quot;query&quot;</font><font color="black">)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//get&nbsp;the&nbsp;search&nbsp;criteria<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">startVal&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">request.getParameter(</font><font color="#808080">&quot;startat&quot;</font><font color="black">)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//get&nbsp;the&nbsp;start&nbsp;index<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">maxresults&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">request.getParameter(</font><font color="#808080">&quot;maxresults&quot;</font><font color="black">)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//get&nbsp;max&nbsp;results&nbsp;per&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">try&nbsp;</font><font color="black">{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;maxpage&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">Integer.parseInt(maxresults)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//parse&nbsp;the&nbsp;max&nbsp;results&nbsp;first<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">startindex&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">Integer.parseInt(startVal)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//then&nbsp;the&nbsp;start&nbsp;index&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}&nbsp;</font><font color="blue">catch&nbsp;</font><font color="black">(Exception&nbsp;e)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</font><font color="darkgreen">//we&nbsp;don&#39;t&nbsp;care&nbsp;if&nbsp;something&nbsp;happens&nbsp;we&#39;ll&nbsp;just&nbsp;start&nbsp;at&nbsp;0<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//or&nbsp;end&nbsp;at&nbsp;50</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(queryString&nbsp;</font><font color="blue">==&nbsp;null</font><font color="black">)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">throw&nbsp;new&nbsp;</font><font color="black">ServletException(</font><font color="#808080">&quot;no&nbsp;query&nbsp;&quot;&nbsp;</font><font color="black">+&nbsp;</font><font color="darkgreen">//if&nbsp;you&nbsp;don&#39;t&nbsp;have&nbsp;a&nbsp;query&nbsp;then<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="#808080">&quot;specified&quot;</font><font color="black">)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//you&nbsp;probably&nbsp;played&nbsp;on&nbsp;the&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//query&nbsp;string&nbsp;so&nbsp;you&nbsp;get&nbsp;the&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//treatment</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">Analyzer&nbsp;analyzer&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">StandardAnalyzer()</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//construct&nbsp;our&nbsp;usual&nbsp;analyzer<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">try&nbsp;</font><font color="black">{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;QueryParser&nbsp;qp&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">QueryParser(</font><font color="#808080">&quot;contents&quot;</font><font color="black">,&nbsp;analyzer)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">query&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">qp.parse(queryString)</font><font color="blue">;&nbsp;</font><font color="darkgreen">//parse&nbsp;the&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}&nbsp;</font><font color="blue">catch&nbsp;</font><font color="black">(ParseException&nbsp;e)&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//query&nbsp;and&nbsp;construct&nbsp;the&nbsp;Query<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//object<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//if&nbsp;it&#39;s&nbsp;just&nbsp;&quot;operator&nbsp;error&quot;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//send&nbsp;them&nbsp;a&nbsp;nice&nbsp;error&nbsp;HTML</p>
<p></font><font color="black">%&gt;<br />
&lt;p&gt;Error&nbsp;</font><font color="blue">while&nbsp;</font><font color="black">parsing&nbsp;query:&nbsp;&lt;%</font><font color="blue">=</font><font color="black">escapeHTML(e.getMessage())%&gt;&lt;/p&gt;<br />
&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;error&nbsp;</font><font color="blue">=&nbsp;true;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//don&#39;t&nbsp;bother&nbsp;with&nbsp;the&nbsp;rest&nbsp;of<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//the&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
%&gt;<br />
&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(error&nbsp;</font><font color="blue">==&nbsp;false&nbsp;</font><font color="black">&amp;&amp;&nbsp;searcher&nbsp;!</font><font color="blue">=&nbsp;null</font><font color="black">)&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;if&nbsp;we&#39;ve&nbsp;had&nbsp;no&nbsp;errors<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;searcher&nbsp;!=&nbsp;null&nbsp;was&nbsp;to&nbsp;handle<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;a&nbsp;weird&nbsp;compilation&nbsp;bug&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">thispage&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">maxpage</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;default&nbsp;last&nbsp;element&nbsp;to&nbsp;maxpage<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">hits&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">searcher.search(query)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;run&nbsp;the&nbsp;query&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(hits.length()&nbsp;</font><font color="blue">==&nbsp;</font><font color="maroon">0</font><font color="black">)&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;if&nbsp;we&nbsp;got&nbsp;no&nbsp;results&nbsp;tell&nbsp;the&nbsp;user<br />
</font><font color="black">%&gt;<br />
&lt;p&gt;&nbsp;I</font><font color="#808080">&#39;m&nbsp;sorry&nbsp;I&nbsp;couldn&#39;</font><font color="black">t&nbsp;find&nbsp;what&nbsp;you&nbsp;were&nbsp;looking&nbsp;</font><font color="blue">for</font><font color="black">.&nbsp;&lt;/p&gt;<br />
&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;error&nbsp;</font><font color="blue">=&nbsp;true;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;don&#39;t&nbsp;bother&nbsp;with&nbsp;the&nbsp;rest&nbsp;of&nbsp;the<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(error&nbsp;</font><font color="blue">==&nbsp;false&nbsp;</font><font color="black">&amp;&amp;&nbsp;searcher&nbsp;!</font><font color="blue">=&nbsp;null</font><font color="black">)&nbsp;{<br />
%&gt;<br />
&lt;table&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;tr&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;td&gt;Document&lt;/td&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;td&gt;Summary&lt;/td&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;/tr&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">((startindex&nbsp;+&nbsp;maxpage)&nbsp;&gt;&nbsp;hits.length())&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;thispage&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">hits.length()&nbsp;-&nbsp;startindex</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;set&nbsp;the&nbsp;max&nbsp;index&nbsp;to&nbsp;maxpage&nbsp;or&nbsp;last<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;actual&nbsp;search&nbsp;result&nbsp;whichever&nbsp;is&nbsp;less</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">for&nbsp;</font><font color="black">(</font><font color="blue">int&nbsp;</font><font color="black">i&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">startindex</font><font color="blue">;&nbsp;</font><font color="black">i&nbsp;&lt;&nbsp;(thispage&nbsp;+&nbsp;startindex)</font><font color="blue">;&nbsp;</font><font color="black">i++)&nbsp;{&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;for&nbsp;each&nbsp;element<br />
</font><font color="black">%&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;tr&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Document&nbsp;doc&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">hits.doc(i)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//get&nbsp;the&nbsp;next&nbsp;document&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;doctitle&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">doc.</font><font color="blue">get</font><font color="black">(</font><font color="#808080">&quot;title&quot;</font><font color="black">)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//get&nbsp;its&nbsp;title<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;url&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">doc.</font><font color="blue">get</font><font color="black">(</font><font color="#808080">&quot;path&quot;</font><font color="black">)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//get&nbsp;its&nbsp;path&nbsp;field<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(url&nbsp;!</font><font color="blue">=&nbsp;null&nbsp;</font><font color="black">&amp;&amp;&nbsp;url.startsWith(</font><font color="#808080">&quot;../webapps/&quot;</font><font color="black">))&nbsp;{&nbsp;</font><font color="darkgreen">//&nbsp;strip&nbsp;off&nbsp;../webapps&nbsp;prefix&nbsp;if&nbsp;present<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">url&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">url.substring(</font><font color="maroon">10</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">((doctitle&nbsp;</font><font color="blue">==&nbsp;null</font><font color="black">)&nbsp;||&nbsp;doctitle.equals(</font><font color="#808080">&quot;&quot;</font><font color="black">))&nbsp;</font><font color="darkgreen">//use&nbsp;the&nbsp;path&nbsp;if&nbsp;it&nbsp;has&nbsp;no&nbsp;title<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;doctitle&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">url</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//then&nbsp;output!<br />
</font><font color="black">%&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;td&gt;&lt;a&nbsp;href</font><font color="blue">=</font><font color="#808080">&quot;&lt;%=url%&gt;&quot;</font><font color="black">&gt;&lt;%</font><font color="blue">=</font><font color="black">doctitle%&gt;&lt;/a&gt;&lt;/td&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;td&gt;&lt;%</font><font color="blue">=</font><font color="black">doc.</font><font color="blue">get</font><font color="black">(</font><font color="#808080">&quot;summary&quot;</font><font color="black">)%&gt;&lt;/td&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;/tr&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;%&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;%&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">((startindex&nbsp;+&nbsp;maxpage)&nbsp;&lt;&nbsp;hits.length())&nbsp;{&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//if&nbsp;there&nbsp;are&nbsp;more&nbsp;results&#8230;display&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//the&nbsp;more&nbsp;link</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;moreurl&nbsp;</font><font color="blue">=&nbsp;</font><font color="#808080">&quot;results.jsp?query=&quot;&nbsp;</font><font color="black">+<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;URLEncoder.encode(queryString)&nbsp;+&nbsp;</font><font color="darkgreen">//construct&nbsp;the&nbsp;&quot;more&quot;&nbsp;link<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="#808080">&quot;&amp;maxresults=&quot;&nbsp;</font><font color="black">+&nbsp;maxpage&nbsp;+<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="#808080">&quot;&amp;startat=&quot;&nbsp;</font><font color="black">+&nbsp;(startindex&nbsp;+&nbsp;maxpage)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">%&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;tr&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;a&nbsp;href</font><font color="blue">=</font><font color="#808080">&quot;&lt;%=moreurl%&gt;&quot;</font><font color="black">&gt;More&nbsp;Results&gt;&gt;&lt;/a&gt;&lt;/td&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;/tr&gt;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&lt;%<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;%&gt;<br />
&lt;/table&gt;</p>
<p>&lt;%&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//then&nbsp;include&nbsp;our&nbsp;footer.<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(searcher&nbsp;!</font><font color="blue">=&nbsp;null</font><font color="black">)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;searcher.</font><font color="blue">close</font><font color="black">()</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
%&gt;</font><font color="black"><br />
</font>
</p>
<p>
<span style="font-family: tahoma; font-size: 8pt; color: #808080">Colorized by: <a style="color: #808080" href="http://www.carlosag.net/Tools/CodeColorizer/">CarlosAg.CodeColorizer</a></span>
</p>
<p>
In order to see all of these components working together i have create two sample netbans projects. One binary application that will do the indexing/crawling and a web project with the search pages. The web application uses the glassfish (based on tomcat) application server, but this can easily be changed to your prefered application server. Even though these applications are basic, they are a perfectly adequate solution to index/search sites with many thousand pages. With the current configuration, a new index will overwrite the previous index, but this can be changed using the index writer properties, this will allow the indexing of multiple sites into one index.
</p>
<p>
&nbsp;If there is sufficient interest and i have the time, i will create a C#.Net web client that can query the index createdd in Java. This can be achieved using the Lucene.Net api.
</p>
<p>
&nbsp;The code for the two projects can be downloaded, by clicking on the &quot;Download&quot; button <a href="http://code.assembla.com/JSearchEngine/subversion/nodes#">here,</a>
</p>
<p>
Or can be checked out of the following svn repository, there is no user name or password required: http://svn.assembla.com/svn/JSearchEngine
</p>
<p>
The code is provided &quot;as is&quot; with no express or<br />
implied warranty, however every attempt has been made to ensure accuracy.
</p>
<p>
You are free to use this code for commercial/non-commercial use as long as you abide by the terms of the Lucene and Html Parser licences.
</p>
<p>
&nbsp;
</p>
<p><font color="black"></font></p>
<p>
<a href="/blog/post/Open-Source-Enterprise-Search-Review.aspx">-&gt; part 1</a><br />
<a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Two.aspx">-&gt; part 2</a><br />
<a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Three.aspx">-&gt; part 3</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Four.aspx/feed</wfw:commentRss>
		</item>
		<item>
		<title>Open Source Enterprise Search Review Part Three</title>
		<link>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Three.aspx</link>
		<comments>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Three.aspx#comments</comments>
		<pubDate>Fri, 20 Mar 2009 23:45:04 +0000</pubDate>
		<dc:creator>steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tek-dev.com/blog/?p=7</guid>
		<description><![CDATA[
In the last part we looked at the lucene architecture and how lucene documents are composed. In this part we will concentrate in crawling a website and then adding the crawled site to a lucene index. In part 1 we discussed the notion of recursively crawling a web pages to eventually find all of the [...]]]></description>
			<content:encoded><![CDATA[<p>
In the last part we looked at the lucene architecture and how lucene documents are composed. In this part we will concentrate in crawling a website and then adding the crawled site to a lucene index. In <a href="/blog/post/Open-Source-Enterprise-Search-Review.aspx">part 1</a> we discussed the notion of recursively crawling a web pages to eventually find all of the pages/links in a web site. We can achieve this by implementing a recursive indexing/crawling function. There are four important things going on in the following piece of code:
</p>
<ol>
<li>Go to starting page, index the page then follow all links on the page&nbsp;</li>
<li>After following the links, index the linked pages, and follow the links on those pages. </li>
<li>Repeat step 2&nbsp;</li>
<li>Do not index pages that have already been indexed, or pages from other sites. </li>
</ol>
<p>The recursive crawler/indexer also makes use of an object called LinkParser this is a object that makes use of the <a href="http://htmlparser.sourceforge.net/">html parser</a> library to extract all of the links form a particular web page.</p>
<p>In order to write a dcoument to the index, lucene requires the use of an Analyzer, an analyzer proceses the content before it is added to the index. In this example we use the standard analyser, which is little more than a string tokeniser.</p>
<div class="code">
<font color="blue">package&nbsp;</font><font color="black">jsearchengine</font><font color="blue">;</p>
<p>import&nbsp;</font><font color="black">org.apache.lucene.analysis.standard.StandardAnalyzer</font><font color="blue">;<br />
import&nbsp;</font><font color="black">org.apache.lucene.</font><font color="blue">document</font><font color="black">.Document</font><font color="blue">;<br />
import&nbsp;</font><font color="black">org.apache.lucene.index.IndexWriter</font><font color="blue">;</p>
<p>import&nbsp;</font><font color="black">java.net.URL</font><font color="blue">;<br />
import&nbsp;</font><font color="black">java.util.ArrayList</font><font color="blue">;</p>
<p>public&nbsp;class&nbsp;</font><font color="black">Main&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">private&nbsp;static&nbsp;</font><font color="black">IndexWriter&nbsp;writer</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;new&nbsp;index&nbsp;being&nbsp;built<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">private&nbsp;static&nbsp;</font><font color="black">ArrayList&nbsp;indexed</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;private&nbsp;static&nbsp;</font><font color="black">String&nbsp;beginDomain</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;public&nbsp;static&nbsp;</font><font color="black">void&nbsp;main(String[]&nbsp;args)&nbsp;throws&nbsp;Exception&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;index&nbsp;</font><font color="blue">=&nbsp;</font><font color="#808080">&quot;/opt/lucene/index&quot;</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;boolean&nbsp;</font><font color="black">create&nbsp;</font><font color="blue">=&nbsp;true;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">String&nbsp;link&nbsp;</font><font color="blue">=&nbsp;</font><font color="#808080">&quot;http://www.tek-dev.com/&quot;</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">beginDomain&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">Domain(link)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System</font><font color="black">.out.println(beginDomain)</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">writer&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">IndexWriter(index,&nbsp;</font><font color="blue">new&nbsp;</font><font color="black">StandardAnalyzer(),&nbsp;create,<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">new&nbsp;</font><font color="black">IndexWriter.MaxFieldLength(</font><font color="maroon">1000000</font><font color="black">))</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">indexed&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">ArrayList()</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">indexDocs(link)</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System</font><font color="black">.out.println(</font><font color="#808080">&quot;Optimizing&#8230;&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">writer.optimize()</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">writer.</font><font color="blue">close</font><font color="black">()</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">private&nbsp;static&nbsp;</font><font color="black">void&nbsp;indexDocs(String&nbsp;url)&nbsp;throws&nbsp;Exception&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//index&nbsp;page<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">Document&nbsp;doc&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">HTMLDocument.Document(url)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System</font><font color="black">.out.println(</font><font color="#808080">&quot;adding&nbsp;&quot;&nbsp;</font><font color="black">+&nbsp;doc.</font><font color="blue">get</font><font color="black">(</font><font color="#808080">&quot;path&quot;</font><font color="black">))</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;try&nbsp;</font><font color="black">{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;indexed.add(doc.</font><font color="blue">get</font><font color="black">(</font><font color="#808080">&quot;path&quot;</font><font color="black">))</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">writer.addDocument(doc)</font><font color="blue">;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//&nbsp;add&nbsp;docs&nbsp;unconditionally<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//TODO:&nbsp;only&nbsp;add&nbsp;html&nbsp;docs<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//and&nbsp;create&nbsp;other&nbsp;doc&nbsp;types</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//get&nbsp;all&nbsp;links&nbsp;on&nbsp;the&nbsp;page&nbsp;then&nbsp;index&nbsp;them<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">LinkParser&nbsp;lp&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">LinkParser(url)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">URL[]&nbsp;links&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">lp.ExtractLinks()</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for&nbsp;</font><font color="black">(URL&nbsp;l&nbsp;:&nbsp;links)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//make&nbsp;sure&nbsp;the&nbsp;url&nbsp;hasnt&nbsp;already&nbsp;been&nbsp;indexed<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//make&nbsp;sure&nbsp;the&nbsp;url&nbsp;contains&nbsp;the&nbsp;home&nbsp;domain<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//ignore&nbsp;urls&nbsp;with&nbsp;a&nbsp;querystrings&nbsp;by&nbsp;excluding&nbsp;&quot;?&quot;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">((!indexed.contains(l.toURI().toString()))&nbsp;&amp;&amp;&nbsp;</font>
</div>
<div class="code">
<font color="black">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (l.toURI().toString().contains(beginDomain))&nbsp;&amp;&amp;&nbsp;</font>
</div>
<div class="code">
<font color="black">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (!l.toURI().toString().contains(</font><font color="#808080">&quot;?&quot;</font><font color="black">)))&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="darkgreen">//don&#39;t&nbsp;index&nbsp;zip&nbsp;files<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">if&nbsp;</font><font color="black">(!l.toURI().toString().endsWith(</font><font color="#808080">&quot;.zip&quot;</font><font color="black">))&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">System</font><font color="black">.out.print(l.toURI().toString())</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">indexDocs(l.toURI().toString())</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</font><font color="blue">catch&nbsp;</font><font color="black">(Exception&nbsp;e)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">System</font><font color="black">.out.println(e.toString())</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
&nbsp;&nbsp;&nbsp;&nbsp;}</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">private&nbsp;static&nbsp;</font><font color="black">String&nbsp;Domain(String&nbsp;url)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">int&nbsp;</font><font color="black">firstDot&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">url.indexOf(</font><font color="#808080">&quot;.&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;int&nbsp;</font><font color="black">lastDot&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">url.lastIndexOf(</font><font color="#808080">&quot;.&quot;</font><font color="black">)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;</font><font color="black">url.substring(firstDot&nbsp;+&nbsp;</font><font color="maroon">1</font><font color="black">,&nbsp;lastDot)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
}</font>
</div>
<p>
The following is the link parser object used in the above code:
</p>
<p>
&nbsp;
</p>
<div class="code">
<font color="blue">import&nbsp;</font><font color="black">org.htmlparser.NodeFilter</font><font color="blue">;<br />
import&nbsp;</font><font color="black">org.htmlparser.Parser</font><font color="blue">;<br />
import&nbsp;</font><font color="black">org.htmlparser.filters.NodeClassFilter</font><font color="blue">;<br />
import&nbsp;</font><font color="black">org.htmlparser.tags.LinkTag</font><font color="blue">;<br />
import&nbsp;</font><font color="black">org.htmlparser.util.NodeList</font><font color="blue">;<br />
import&nbsp;</font><font color="black">org.htmlparser.util.ParserException</font><font color="blue">;</p>
<p>import&nbsp;</font><font color="black">java.util.Vector</font><font color="blue">;<br />
import&nbsp;</font><font color="black">java.net.URL</font><font color="blue">;<br />
import&nbsp;</font><font color="black">java.net.MalformedURLException</font><font color="blue">;</p>
<p></font><font color="darkgreen">/**<br />
&nbsp;*<br />
&nbsp;*&nbsp;@author&nbsp;Stephen.Lane<br />
&nbsp;*/<br />
</font><font color="blue">public&nbsp;class&nbsp;</font><font color="black">LinkParser&nbsp;{</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;url</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">Parser&nbsp;parser</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">NodeFilter&nbsp;filter</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">NodeList&nbsp;list</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">LinkTag&nbsp;link</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">URL[]&nbsp;linkArray</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">Vector&nbsp;vector</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;public&nbsp;</font><font color="black">LinkParser(String&nbsp;Url)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;url&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">Url</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">public&nbsp;</font><font color="black">URL[]&nbsp;ExtractLinks()&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;filter&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">NodeClassFilter(LinkTag.</font><font color="blue">class</font><font color="black">)</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;try&nbsp;</font><font color="black">{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;parser&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">Parser(url)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">list&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">parser.extractAllNodesThatMatch(filter)</font><font color="blue">;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">vector&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">Vector()</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for&nbsp;</font><font color="black">(</font><font color="blue">int&nbsp;</font><font color="black">i&nbsp;</font><font color="blue">=&nbsp;</font><font color="maroon">0</font><font color="blue">;&nbsp;</font><font color="black">i&nbsp;&lt;&nbsp;list.size()</font><font color="blue">;&nbsp;</font><font color="black">i++)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">try&nbsp;</font><font color="black">{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;link&nbsp;</font><font color="blue">=&nbsp;</font><font color="black">(LinkTag)&nbsp;list.elementAt(i)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">vector.add(</font><font color="blue">new&nbsp;</font><font color="black">URL(link.getLink()))</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}&nbsp;</font><font color="blue">catch&nbsp;</font><font color="black">(MalformedURLException&nbsp;murle)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;linkArray&nbsp;</font><font color="blue">=&nbsp;new&nbsp;</font><font color="black">URL[vector.size()]</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">vector.copyInto(linkArray)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}&nbsp;</font><font color="blue">catch&nbsp;</font><font color="black">(ParserException&nbsp;e)&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;e.printStackTrace()</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="blue">return&nbsp;</font><font color="black">(linkArray)</font><font color="blue">;<br />
&nbsp;&nbsp;&nbsp;&nbsp;</font><font color="black">}<br />
}<br />
</font>
</div>
<p>
Now that we have the pages crawled and each of the pages indexed all we have to do is query te index to display some results, we will look at that in <a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Four.aspx">part 4</a>.
</p>
<p>
<a href="http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review.aspx">-&gt; part 1</a><br />
<a href="http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Two.aspx">-&gt; part 2</a><br />
<a href="http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Four.aspx">-&gt; part 4</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Three.aspx/feed</wfw:commentRss>
		</item>
		<item>
		<title>Open Source Enterprise Search Review Part Two</title>
		<link>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Two.aspx</link>
		<comments>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Two.aspx#comments</comments>
		<pubDate>Fri, 20 Mar 2009 23:33:51 +0000</pubDate>
		<dc:creator>steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tek-dev.com/blog/?p=5</guid>
		<description><![CDATA[In the first part of this four part series, we have looked at some of the fundamentals of enterprise search, in this second part we will take a look at some of the available open source technologies that we can leverage to implement those concepts.
With out a doubt the most challanging part of a search [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="/blog/post/Open-Source-Enterprise-Search-Review.aspx">first</a> part of this four part series, we have looked at some of the fundamentals of enterprise search, in this second part we will take a look at some of the available open source technologies that we can leverage to implement those concepts.</p>
<p>With out a doubt the most challanging part of a search engine is the way we index content, and then subsequently how we interogate that index. Some of the main challanges that lie within search are providing results that are relevant to a search term. Take for example the word &#8220;Run&#8221;, if we search for the word &#8220;Run&#8221; we will also be interested in documents that containd words like &#8220;Runable&#8221;, &#8220;Running&#8221;, &#8220;Runs&#8221; etc etc&#8230; Thankfully are indexing engines that we can make use of that handle all of this logic transparently. <a href="http://lucene.apache.org/">Lucene</a> is an appache foundation indexing project, and is widely accepted to be the best open source indexing project available.</p>
<p>In order to implement lucene we must first examine its basic architecture. Lucene is i very cleverly designed, it appears to be simple enough in nature, but the devil is definately in the detail ! The objects that lucene adds to its index are intuitivly called documents. A document is made up of fields, and each filed has a field name and field value. For example if we concider a web page, each of its attributes/tags can be mapped to a field. Such as its title, body etc. etc. If we take the title field for example, the field name id &#8220;title&#8221; and the field value is the actual title. Therefore when we query the index, we are returned an array of documents that match the search term. Any of the fields in the document can be returned as part of the resultset into a pager of repeater. The following code shows a html document type with title, summary, content and url fields.The following exmple uses <a href="http://htmlparser.sourceforge.net/">html parser </a>library to extract text from web pages.</p>
<div class="code"><span style="color: blue;">import </span><span style="color: black;">org.htmlparser.beans.StringBean</span><span style="color: blue;">;</p>
<p>import </span><span style="color: black;">org.htmlparser.Parser</span><span style="color: blue;">;</p>
<p>import </span><span style="color: black;">org.htmlparser.NodeFilter</span><span style="color: blue;">;</p>
<p>import </span><span style="color: black;">org.htmlparser.filters.TagNameFilter</span><span style="color: blue;">;</p>
<p>import </span><span style="color: black;">org.htmlparser.util.ParserException</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;">import </span><span style="color: black;">java.io.*</span><span style="color: blue;">;</p>
<p>import </span><span style="color: black;">org.apache.lucene.</span><span style="color: blue;">document</span><span style="color: black;">.*</span><span style="color: blue;">;</span></p>
<p><span style="color: darkgreen;">/** A utility for making Lucene Documents for HTML documents. */</p>
<p></span><span style="color: blue;">public class </span><span style="color: black;">HTMLDocument {</span></p>
<p><span style="color: black;"> </span><span style="color: blue;">public static </span><span style="color: black;">Document Document(String url)</p>
<p>throws IOException, InterruptedException {</p>
<p></span><span style="color: darkgreen;">// make a new, empty document</p>
<p></span><span style="color: black;">Document doc </span><span style="color: blue;">= new </span><span style="color: black;">Document()</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">String title </span><span style="color: blue;">= new </span><span style="color: black;">String()</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">String summary </span><span style="color: blue;">= new </span><span style="color: black;">String()</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;"> </span><span style="color: darkgreen;">// Add the url as a field named &#8221;path&#8221;.  Use a field that is</p>
<p>// indexed (i.e. searchable), but don&#8217;t tokenize the field into words.</p>
<p></span><span style="color: black;">doc.add(</span><span style="color: blue;">new </span><span style="color: black;">Field(</span><span style="color: #808080;">&#8220;path&#8221;</span><span style="color: black;">, url, Field.Store.YES, Field.Index.NOT_ANALYZED))</span><span style="color: blue;">;</p>
<p></span><span style="color: darkgreen;">// Add the tag-stripped contents as a Reader-valued Text field so it will</p>
<p>// get tokenized and indexed.</p>
<p></span><span style="color: black;">StringBean sb </span><span style="color: blue;">= new </span><span style="color: black;">StringBean()</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">sb.setLinks(</span><span style="color: blue;">false</span><span style="color: black;">)</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">sb.setURL(url)</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;"> </span><span style="color: black;">StringReader sr </span><span style="color: blue;">= new </span><span style="color: black;">StringReader(sb.getStrings())</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;"> </span><span style="color: black;">doc.add(</span><span style="color: blue;">new </span><span style="color: black;">Field(</span><span style="color: #808080;">&#8220;contents&#8221;</span><span style="color: black;">, sr))</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;"> </span><span style="color: black;">Parser bParser</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">NodeFilter bFilter</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;"> try </span><span style="color: black;">{</p>
<p>bParser </span><span style="color: blue;">= new </span><span style="color: black;">Parser()</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">bFilter </span><span style="color: blue;">= new </span><span style="color: black;">TagNameFilter(</span><span style="color: #808080;">&#8220;TITLE&#8221;</span><span style="color: black;">)</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">bParser.setResource(url)</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">title </span><span style="color: blue;">= </span><span style="color: black;">bParser.parse(bFilter).asString()</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;"> </span><span style="color: black;">} </span><span style="color: blue;">catch </span><span style="color: black;">(ParserException e) {</p>
<p>e.printStackTrace()</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">}</span></p>
<p><span style="color: black;"> </span><span style="color: blue;">try </span><span style="color: black;">{</span></p>
<p><span style="color: black;"> bParser </span><span style="color: blue;">= new </span><span style="color: black;">Parser()</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">bFilter </span><span style="color: blue;">= new </span><span style="color: black;">TagNameFilter(</span><span style="color: #808080;">&#8220;BODY&#8221;</span><span style="color: black;">)</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">bParser.setResource(url)</span><span style="color: blue;">;</p>
<p>try </span><span style="color: black;">{</p>
<p>summary </span><span style="color: blue;">= </span><span style="color: black;">bParser.parse(bFilter).asString().substring(</span><span style="color: maroon;">0</span><span style="color: black;">, </span><span style="color: maroon;">200</span><span style="color: black;">)</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">} </span><span style="color: blue;">catch </span><span style="color: black;">(StringIndexOutOfBoundsException e) {</p>
<p>summary </span><span style="color: blue;">= </span><span style="color: #808080;">&#8220;&#8221;</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">}</span></p>
<p><span style="color: black;"> } </span><span style="color: blue;">catch </span><span style="color: black;">(ParserException e) {</p>
<p>e.printStackTrace()</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">}</span></p>
<p><span style="color: black;"> </span><span style="color: darkgreen;">// Add the title as a field that it can be searched and that is stored.</p>
<p></span><span style="color: black;">doc.add(</span><span style="color: blue;">new </span><span style="color: black;">Field(</span><span style="color: #808080;">&#8220;title&#8221;</span><span style="color: black;">, title, Field.Store.YES, Field.Index.ANALYZED))</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">doc.add(</span><span style="color: blue;">new </span><span style="color: black;">Field(</span><span style="color: #808080;">&#8220;summary&#8221;</span><span style="color: black;">, summary, Field.Store.YES, Field.Index.NO))</span><span style="color: blue;">;</span></p>
<p><span style="color: blue;"> return </span><span style="color: black;">doc</span><span style="color: blue;">;</p>
<p></span><span style="color: black;">}</span></p>
<p><span style="color: black;"> </span><span style="color: blue;">private </span><span style="color: black;">HTMLDocument() {</p>
<p>}</p>
<p>}</span></div>
<p>In order to crawl a web site in order to create these html documents, we need to implement some form of crawling mechanism. We will look at this in <a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Three.aspx">part three</a>.</p>
<p><a href="/blog/post/Open-Source-Enterprise-Search-Review.aspx">-&gt; part 1</a> <a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Three.aspx">-&gt; part 3</a><br />
<a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Four.aspx">-&gt; part 4</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review-Part-Two.aspx/feed</wfw:commentRss>
		</item>
		<item>
		<title>Open Source Enterprise Search Review Part One</title>
		<link>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review.aspx</link>
		<comments>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review.aspx#comments</comments>
		<pubDate>Fri, 20 Mar 2009 23:05:18 +0000</pubDate>
		<dc:creator>steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.tek-dev.com/blog/?p=3</guid>
		<description><![CDATA[
When we type out search term into that site search box and hit enter, we expect to be provided with a list of relevant results from the site in question relating to our search term. I am not talking about a search engine such as G&#163;$Gle or Y$%oo I&#8217;m talking about the search boxes that [...]]]></description>
			<content:encoded><![CDATA[<p>
<span style="font-size: 10pt; font-family: Arial">When we type out search term into that site search box and hit enter, we expect to be provided with a list of relevant results from the site in question relating to our search term. I am not talking about a search engine such as G&pound;$Gle or Y$%oo I&rsquo;m talking about the search boxes that decorate most good websites and forums. So how do they work ? Excellent question!, most users don&rsquo;t know or care but as a software developer we are intrigued by this seemingly simple yet important functionality. Whether it be a tool for searching the entire internet, a corporate intranet or a small website, the principal is the same. If we think about it long enough most of us will come up with some sort of approximation as to how the process works. It might go something like &ldquo;Find all the available data sources and move them into an archive, store the data in some soft of searchable format, and make a client to search this database &hellip;&hellip;&hellip;.&rdquo; Although this sounds simple this is basic principal of how all major search engines. Put more formally the three steps are Crawling, Indexing and Searching. There are a variety of tools which can be mixed and matched to complete these activities, and of course they sound much better when they are free or open source, so that&rsquo;s what we will focus for this article.</span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span>
</p>
<p><span style="font-size: 10pt; font-family: Arial"></span></p>
<ul>
<li><span style="font-size: 10pt; font-family: Arial">Crawling</span></li>
</ul>
<p>
<span style="font-size: 10pt; font-family: Arial">Crawling as mentioned above is the process of finding all available data sources and storing in it in some archive. The method used for finding all available data sources will vary depending if we are crawling a network folder, a website or the internet. </span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">If we want to crawl a files system folder such as a SharePoint&trade; archive all we need to do is loop through all of the files in the directory and place them into our archive. </span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">If we want to take a snapshot of the data in one website to search, we can start by recursively following all of the links on the homepage of the site and storing copy of each page as we go. Following links recursively means following all the links on<span>&nbsp; </span>a page, and then following all the links on the target pages and so on. It important not to follow links that leave the site as that will cause data external to the site getting stored. </span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">Searching the entire internet is achieved in a similar manner to a single site search. Most search engines have the facility for webmasters to submit their site to those engines for inclusion in their search. The large search engines then recursively crawl both the internal and external links on these submitted sites. This means that even if a site isn&rsquo;t submitted to the search engine, it will still get crawled if it is linked to by a site that is included in the search engine. Of course this takes a lot of cpu and bandwidth !. (hundreds of thousands of servers)</span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span>
</p>
<ul style="margin-top: 0cm">
<li class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal"><span style="font-size: 10pt; font-family: Arial">Indexing</span></li>
</ul>
<p>
<span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">Once all of the data has been stored to a repository, then the fun can start. At this stage the files and data are not very searchable as the data is stored in a so called &ldquo;heap&rdquo;. To search all of this unstructured data would be very inefficient and slow. In order to make the content more accessible the data need to be stored in a structured format called an index. Thus this is why this process is called indexing. In it simplest form an index is a sorted list of all of the words and phrases that are found in the content that has been retrieved. The words and phrases will be stored in alphabetical order along with their source and rank or popularity. One of the primary issues with adding content to the index is its format, all content needs to be converted to readable text in order to add it to the index. This is a problem when the data is stored in complex or proprietary file types. However this can be over come be the creation of content parsers. Often these parsers are created by owners or proprietary files types to make there file types more accessible. <span>&nbsp;</span></span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">Now that the content is added to a sorted index, a keyword search on that index will quickly retrieve all of the sources for that word or phrase. When there are multiple sources they will be ranked based on the popularity or rank of the source. This then begs the question, so how is the popularity of a page determined. Well there are countless algorithms from a variety of vendors, most of which are propitiatory. A very famous algorithm named Page Rank&trade; is used by the Google&trade; search engine. Its exact ranking/popularity scoring technique is a well guarded secret but it works on the basic premise that if a page has a lot of incoming links then it is considered popular. If the links are coming from pages that are considered popular, then the receiving page is considered to be more popular or have a higher page rank.</span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">Depending on the engine, crawling and indexing can either be done as a single step or as two separate processes. If the two are combined then the content is added to the index immediately after it is retrieved. This is more common on smaller solutions as it does not scale very well. For larger enterprise or internet engines it is much easier to manage one set of high bandwidth servers to crawl and download content, and have set of servers with high cpu capacity to parse and index the content.</span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span>
</p>
<ul style="margin-top: 0cm">
<li class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal"><span style="font-size: 10pt; font-family: Arial">Searching</span></li>
</ul>
<p>
<span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial">Now that all of the content has been indexed the hard work has been done. Now all a client has to do to get some search results is submit a search word or term to the index. If the index is stored in a relational database, then and sql query can be used to retrieve the results. However search engines tend to use specially designed highly efficient data structures to store the index. In this case the method used to retrieve the results will depend on the implementation.</span></span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial">&nbsp;</span><span style="font-size: 10pt; font-family: Arial"></span>&nbsp;<span style="font-size: 10pt; font-family: Arial">&nbsp;</span>
</p>
<p>
<span style="font-size: 10pt; font-family: Arial"></span><span style="font-size: 10pt; font-family: Arial">So, now that we know all of the boring stuff, let&rsquo;s get stuck in and look at some example implementations in <a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Two.aspx">part 2</a>.</span>
</p>
<p>
<a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Two.aspx">-&gt; part 2</a><br />
<a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Three.aspx">-&gt; part 3</a><br />
<a href="/blog/post/Open-Source-Enterprise-Search-Review-Part-Four.aspx">-&gt; part 4</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.tek-dev.com/blog/post/Open-Source-Enterprise-Search-Review.aspx/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
