Two learning approaches for protein name extraction

dc.citation.epage1055en_US
dc.citation.issueNumber6en_US
dc.citation.spage1046en_US
dc.citation.volumeNumber42en_US
dc.contributor.authorTatar, S.en_US
dc.contributor.authorCicekli, I.en_US
dc.date.accessioned2016-02-08T10:01:21Z
dc.date.available2016-02-08T10:01:21Z
dc.date.issued2009en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractProtein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. In the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. Our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature. © 2009 Elsevier Inc. All rights reserved.en_US
dc.identifier.doi10.1016/j.jbi.2009.05.004en_US
dc.identifier.issn1532-0464en_US
dc.identifier.urihttp://hdl.handle.net/11693/22534en_US
dc.language.isoEnglishen_US
dc.publisherAcademic Pressen_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/j.jbi.2009.05.004en_US
dc.source.titleJournal of Biomedical Informaticsen_US
dc.subjectBigram language modelen_US
dc.subjectInformation extractionen_US
dc.subjectProtein name extractionen_US
dc.subjectRule learningen_US
dc.subjectStatistical learningen_US
dc.subjectComputational linguisticsen_US
dc.subjectExperimentsen_US
dc.subjectInformation analysisen_US
dc.subjectLearning algorithmsen_US
dc.subjectEducationen_US
dc.subjectProteinen_US
dc.subjectInformation retrievalen_US
dc.subjectMachine learningen_US
dc.subjectMedical informaticsen_US
dc.subjectPriority journalen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectComputational biologyen_US
dc.subjectInformation storage and retrievalen_US
dc.subjectNatural language processingen_US
dc.subjectTerminology as topicen_US
dc.titleTwo learning approaches for protein name extractionen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Two learning approaches for protein name extraction.pdf
Size:
443.69 KB
Format:
Adobe Portable Document Format
Description:
Full printable version