As the attribute information of the profile stored in a web page is usually in the form of, natural language it is very. Difficult to use the HTML structure to extract the target information. In this paper Conditional Random Fields is adopted. To extract the personal attribute information of the personal detail in web pages. Via segmentation system the HTML document. Could be divided into the sequence of words and then, to establish the appropriate template of characteristics and train. The sample sequences at last, using the characteristics function model generated by CRFs to mark the test sequences and. Identify the information which need to be extracted. The experimental results show that annotation and reasoning function. Of the CRFs in the text sequence can be used to extract the specific attributes information in the personal home page very. Well.
การแปล กรุณารอสักครู่..
