| clear query|facets|time |
Search criteria: .
Results from 1 to 10 from
17 (0.465s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Tika's outlink is not as expected - Nutch - [mail # user]
|
|
...Thank you. I just found it a minute ago and was going to write the email. ([;_]?((?i)l|j|bv_)?((?i)sid|phpsessid|sessionid)=.*?)(\?|&|#|$) Perhaps, I was too tired yesterday ...
|
|
|
Author: Ake Tangkananond,
2012-08-15, 07:47
|
|
|
Re: Tika's outlink is not as expected - Nutch - [mail # user]
|
|
...Hi Ferdy, Thanks for you advise. I don't have any special filtering/normalizing rules except the standard one. I even try disabling all url normalization plugin, but the result is no d...
|
|
|
Author: Ake Tangkananond,
2012-08-14, 16:29
|
|
|
Re: Tika's outlink is not as expected - Nutch - [mail # user]
|
|
...Thanks for reply Ferdy. Variable 'db.max.outlinks.per.page' was set to 100. And I could parse HTML fine. Regards, Ake Tangkananond On 8/14/12 6:43 PM, "Ferdy Galem...
|
|
|
Author: Ake Tangkananond,
2012-08-14, 11:49
|
|
|
Tika's outlink is not as expected - Nutch - [mail # user]
|
|
...Hi, I'm getting an unexpected behavior from nutch parsing mechanism. Perhaps I don't really understand Nucth well. Here is what I find it weird. Could you please advise? I crawl ...
|
|
|
Author: Ake Tangkananond,
2012-08-14, 11:15
|
|
|
Re: Nutch 2 encoding - Nutch - [mail # user]
|
|
...Hi, I'm debugging. I inserted a code to print out the encoding here in HtmlParser:java function getParse and it printed utf-8. So I think it might be the data store problem. What...
|
|
|
Author: Ake Tangkananond,
2012-08-09, 18:05
|
|
|
Re: Nutch 2 encoding - Nutch - [mail # user]
|
|
...Hi, Sorry for late reply. I was trying to figure out myself but seem no luck. I'm on Hbase with local deploy version 0.90.6, r1295128, the working version as said in Wiki: http:/...
|
|
|
Author: Ake Tangkananond,
2012-08-09, 16:06
|
|
|
Nutch 2 encoding - Nutch - [mail # user]
|
|
...Hi all, I just wonder if Nutch 2 is working fine with non english characters in your deployment? Thai language used to work fine for me in Nutch 1.5 but not in Nutch 2. Did I miss some...
|
|
|
Author: Ake Tangkananond,
2012-08-09, 14:05
|
|
|
Nutch plugins/feed - Nutch - [mail # user]
|
|
...Hi, I see there is a rss parser under src/plugins but it wasn't put into deployment profile in src/plugins/build.xml. Are there a substitution to this parser now? Which one I should us...
|
|
|
Author: Ake Tangkananond,
2012-08-08, 07:54
|
|
|
Re: Filter out document before sending to solr index - Nutch - [mail # user]
|
|
...Cool. Thanks. !! Regards, Ake Tangkananond On 8/7/12 2:58 PM, "Ferdy Galema" wrote: ...
|
|
|
Author: Ake Tangkananond,
2012-08-07, 08:05
|
|
|
Filter out document before sending to solr index - Nutch - [mail # user]
|
|
...Hi, Is it possible to filter out some document from being indexed in Solr when executing command "bin/nutch solrindex" ? Note: Those document are fine living in the nutch datasto...
|
|
|
Author: Ake Tangkananond,
2012-08-07, 07:49
|
|
|
|