Publication
On URL and content persistence
| Summary: | This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic distribution. We characterized persistent URLs and contents, and identified reasons for URL death. We found that lasting contents tend to be referenced by different URLs during their lifetime. On the other hand, half of the contents referenced by persistent URLs did not change |
|---|---|
| Main Authors: | Gomes, Daniel |
| Other Authors: | Silva, Mário J. |
| Subject: | URL persistence content persistence tomba |
| Year: | 2005 |
| Country: | Portugal |
| Document type: | report |
| Access type: | open access |
| Associated institution: | Universidade de Lisboa |
| Language: | Portuguese |
| Origin: | Repositório da Universidade de Lisboa |
| Summary: | This report presents a study of URL and content persistence among 51 million pages from a national web harvested 8 times over almost 3 years. This study differs from previous ones because it describes the evolution of a large set of web pages for several years, studying in depth the characteristics of persistent data. We found that the persistence of URLs and contents follows a logarithmic distribution. We characterized persistent URLs and contents, and identified reasons for URL death. We found that lasting contents tend to be referenced by different URLs during their lifetime. On the other hand, half of the contents referenced by persistent URLs did not change |
|---|