Two learning approaches for protein name extraction
dc.citation.epage | 1055 | en_US |
dc.citation.issueNumber | 6 | en_US |
dc.citation.spage | 1046 | en_US |
dc.citation.volumeNumber | 42 | en_US |
dc.contributor.author | Tatar, S. | en_US |
dc.contributor.author | Cicekli, I. | en_US |
dc.date.accessioned | 2016-02-08T10:01:21Z | |
dc.date.available | 2016-02-08T10:01:21Z | |
dc.date.issued | 2009 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description.abstract | Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. In the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. Our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature. © 2009 Elsevier Inc. All rights reserved. | en_US |
dc.identifier.doi | 10.1016/j.jbi.2009.05.004 | en_US |
dc.identifier.issn | 1532-0464 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/22534 | en_US |
dc.language.iso | English | en_US |
dc.publisher | Academic Press | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1016/j.jbi.2009.05.004 | en_US |
dc.source.title | Journal of Biomedical Informatics | en_US |
dc.subject | Bigram language model | en_US |
dc.subject | Information extraction | en_US |
dc.subject | Protein name extraction | en_US |
dc.subject | Rule learning | en_US |
dc.subject | Statistical learning | en_US |
dc.subject | Computational linguistics | en_US |
dc.subject | Experiments | en_US |
dc.subject | Information analysis | en_US |
dc.subject | Learning algorithms | en_US |
dc.subject | Education | en_US |
dc.subject | Protein | en_US |
dc.subject | Information retrieval | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Medical informatics | en_US |
dc.subject | Priority journal | en_US |
dc.subject | Artificial Intelligence | en_US |
dc.subject | Computational biology | en_US |
dc.subject | Information storage and retrieval | en_US |
dc.subject | Natural language processing | en_US |
dc.subject | Terminology as topic | en_US |
dc.title | Two learning approaches for protein name extraction | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Two learning approaches for protein name extraction.pdf
- Size:
- 443.69 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version