Statistics for A template-independent content extraction approach for new web pages