注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

anqiang专栏

不要问细节是怎么搞的,源码说明一切

 
 
 

日志

 
 

Weka中常见问题解答列表(四)  

2009-11-11 09:43:21|  分类: Weka 学习系列 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

1.  关于聚类中距离计算的问题

Q:

Hi...

if some of my variables are catogoricals...some are numeric............
to do cluster analysis, I should use Gower's distance ........am i right? Is there other options???

If i use Weka Explorer, Can i choose the Gower's distance??? How?? I couldn't find Gower's distance on the menu or may be i don't look in the right place..

Thank you so much for all your help

Best.
S.T.

 

Reply:

if some of my variables are catogoricals...some are numeric............
> to do cluster analysis, I should use Gower's distance ........am i right?

Why not use euclidean distance? Works as well.

> Is there other options???

Currently available distance functions in the developer version:
  Chebyshev, Edit, Euclidean, Manhattan, Minkowski


> If i use Weka Explorer, Can i choose the Gower's distance???

Gower's distance is not part of Weka, but feel free to implement it
and contribute it.

Cheers, Peter

2.       关于在大数据量下聚类将出现局部最优化问题的情况

Q:

Folks
While running the K-Means (SimpleKMeans implementation) for large subset, i
found that the initial condition of "random" centroid chosen from the space
is very vital. Sometimes (20% of times) things get stuck in local optima.
What are the ways to make K-Means come out of this local optima? Some of my
thoughts were

   - Run GA based K-Means and allow reproductions and crossovers to get
   variety and search global optima
   - Use some meta heuristics

If someone has used it and are aware of WEKA tools to do this, let me know
Cheers
Uday

R:

Ø       Can someone reply to this, please?

People will reply if they have an answer to a question. You can't
expect more from a mailing list run by volunteers and users.

> Because my data has combination of
> Categorical and Continuous, the centroids 20% of times are stuck in local
> optima, is there some way around this? Has anyone seen this or solved it?

There is no "meta-framework" available to deal with such problems.
Apart from varying the seed value for randomization there is nothing
you can do.

You can always implement your own meta-clusterer that determines via
some magic statistic whether the base-clusterer is stuck in a local
minimum and do something about. But it might involve some more work
than just that...

Cheers, Peter

 

3.关于 Prefix Based Tree的问题

> I'm searching for a Prefix Based Tree. I'm using an implementation of a
> CPT (Compact Patricia Tree) of my university, but it's very greedy for
> memory.
>
> I could not found a prefix based tree in the weka-wiki, so just to be
> shure: Am I blind or isn't one implemented in the weka-package?

Not sure what you want to use the tree for, but maybe the
weka.core.Trie class (for strings) would be helpful?

Cheers, Peter

 

4.关于文本分类和多标签分类的问题

> Thanks for the information. But, I have a simple question. Is the Weka
> ensemble or boosting algorithm support the following things automatically:
> (1) Multi-class with multiple labels

Just to clarify: Weka allows you only to have a single class
attribute. See also FAQ "Does WEKA support multi-label
classification?
".

If you develop an ensemble classifier, then it depends a lot on the
base classifier(s) what data can be processed (= their capabilities).
The MultipleClassifiersCombiner superclass, for instance, returns as
capabilities only the capabilities that *all* of the base classifiers
share (see "getCapabilities()" method).

屈伟是做多标签的,有问题可以向他请教。至于在weka里面如何处理,可以参考FAQ中的解答。FAQ的地址在我的收藏链接里面有。


> (2) feature vectors to represent documents

See FAQ "How do I perform text classification?".

Link to the FAQs available from the Weka homepage.

 

关于在weka中进行文本分类工作的解释。以前我一直使用WVTool配合weka来做文本分类,现在才发现weka已经支持了文本分类的东西。有兴趣的可以仔细看看。因为weka对中文支持的不好,而且它的分词工具也很简单,所以还是有很多工作需要自己做的。

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

 

文中所涉及的信件内容均属于发件人所有,在此仅为转载,版权为发件人所有。

  评论这张
 
阅读(975)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017