Two learning approaches for protein name extraction

Tatar, S.; Cicekli, I.

Two learning approaches for protein name extraction

dc.citation.epage	1055	en_US
dc.citation.issueNumber	6	en_US
dc.citation.spage	1046	en_US
dc.citation.volumeNumber	42	en_US
dc.contributor.author	Tatar, S.	en_US
dc.contributor.author	Cicekli, I.	en_US
dc.date.accessioned	2016-02-08T10:01:21Z
dc.date.available	2016-02-08T10:01:21Z
dc.date.issued	2009	en_US
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. In the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. Our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature. © 2009 Elsevier Inc. All rights reserved.	en_US
dc.identifier.doi	10.1016/j.jbi.2009.05.004	en_US
dc.identifier.issn	1532-0464	en_US
dc.identifier.uri	http://hdl.handle.net/11693/22534	en_US
dc.language.iso	English	en_US
dc.publisher	Academic Press	en_US
dc.relation.isversionof	http://dx.doi.org/10.1016/j.jbi.2009.05.004	en_US
dc.source.title	Journal of Biomedical Informatics	en_US
dc.subject	Bigram language model	en_US
dc.subject	Information extraction	en_US
dc.subject	Protein name extraction	en_US
dc.subject	Rule learning	en_US
dc.subject	Statistical learning	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Experiments	en_US
dc.subject	Information analysis	en_US
dc.subject	Learning algorithms	en_US
dc.subject	Education	en_US
dc.subject	Protein	en_US
dc.subject	Information retrieval	en_US
dc.subject	Machine learning	en_US
dc.subject	Medical informatics	en_US
dc.subject	Priority journal	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	Computational biology	en_US
dc.subject	Information storage and retrieval	en_US
dc.subject	Natural language processing	en_US
dc.subject	Terminology as topic	en_US
dc.title	Two learning approaches for protein name extraction	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Two learning approaches for protein name extraction.pdf
Size:: 443.69 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Computer Engineering