1 Introduction

The web continues to expand and the dominant search engines, Google and Yahoo! claim to have indexed more than 20 billion pages (Mayer 2005). Recent statistics on Internet usage by language show that 31.2% is English and 68.8% is non-English (Internet World Statistics 2007b). As the non-English web usage increases, there is an increasing number of non-English queries that need to be handled by the search engines.

The goals of this research are to:

  1. (a)

    evaluate how well search engines respond to Greek language queries;

  2. (b)

    assess whether the Greek or global search engines are more effective in satisfying user requests, and,

  3. (c)

    evaluate the extent of coverage of the Greek web by the ten search engines.

Preliminary results of the present study as these pertain to (a) and part of (b) above appeared in Efthimiadis et al. (2008). To achieve these goals the study was conducted as follows:

  1. 1.

    a set of queries were searched in 10 search engines (5 Greek, 5 global) and the results were evaluated to see if the correct answer was returned;

  2. 2.

    all the URLs found in the result sets were retrieved to identify the percentage that were live (active) or dead (non-active) links;

  3. 3.

    a sample of 32480 active URLs from the Greek web was used to evaluate whether the search engines had them indexed.

The organization of the paper is as follows: Sect. 2 reviews related work, Sect. 3 gives a brief overview of the Greek language, Sect. 4 presents the methodology, Sect. 5 discusses the results, and the conclusions are given in Sect. 6.

2 Related work

Bar-Ilan and Gutman (2005) explored how search engines respond to queries in four non-English languages, Russian, French, Hungarian and Hebrew. For each of the languages they searched in three global search engines, AltaVista, FAST and Google, and in two or three local engines. The local engines were the Russian Yandex, Rambler, Aport; the French Voila, AOL France, La Toile de Quebec, the Hungarian Origo-vizsla, Startlap, Heureka, and the Hebrew Morfix and Walla. For each of the four languages the authors developed queries that emphasized specific linguistic characteristics of that language. The first ten results of each search were evaluated not for relevance, but for whether the exact word form or a morphological variant of the query was retrieved. They found that the search engines ignored the special language characteristics and did not handle diacritics well.

Moukdad (2004) studied how three global search engines, AltaVista, AllTheWeb, and Google, handle Arabic queries compared to three Arabic engines, Al bahhar, Ayna, and Morfix. He employed the same methodology used by Bar-Ilan and Gutman (2005). A set of eight Arabic search terms was selected and run in the six search engines. He found that the global search engines had shortcomings in handling Arabic.

Sroka (2000) evaluated Polish versions of English language search engines and Polish search engines. The evaluation focused on search capability and retrieval performance. Precision was based on relevance judgments for the first 10 matches from each search engine. The overlap of retrieved documents and the response time for each search engine were recorded. Of the five search engines that were evaluated, Polski Infoseek and Onet.pl had the best precision scores, and Polski Infoseek turned out to be the fastest Web search engine.

Kelly-Holmes (2006) conducted a study searching with Irish Gaelic words on the Irish language version of Google. Five words from ‘typical’ and ‘non-typical’ domains for Irish were used, and the results were analyzed in terms of the “authenticity” of the search process and results, the language usage in the sites found through the search process, and the domains represented by the results. The study identified a number of problems encountered when searching using the Irish Gaelic language.

Bitirim et al. (2002) investigated the performance of four Turkish search engines with respect to precision, normalized recall, coverage and novelty ratios. They used seventeen queries and searched Arabul, Arama, Netbul and Superonline. The queries were carefully selected to assess the capability of a search engine for handling broad or narrow topic subjects, exclusion of particular information, identifying and indexing Turkish characters, retrieval of authoritative pages, stemming of Turkish words, and correct interpretation of Boolean operators. Arama appears to be the best Turkish search engine in terms of average precision and normalized recall ratios, and the coverage of Turkish sites. The handling of Turkish characters and stemming still causes problems to the Turkish search engines. Superonline and Netbul make use of the indexing information in meta-tag fields to improve retrieval results.

Griesbaum (2004) investigated the retrieval effectiveness of three popular German Web search services: AltaVista.de, Google.de and Lycos.de. Fifty queries were used both in German and in their English translation. The top twenty results were evaluated for precision. The findings indicated that Google performed significantly better than AltaVista, but there was no significant difference between Google and Lycos. Lycos also achieved better values than AltaVista, but the differences reached were not statistically significant. When comparing the 2004 results to a similar study by the author in 2002 the results were similar, but the gaps between the engines were closer. The overall conclusion of the study was that the retrieval performance of the engines was very close to each other.

Lazarinis (2007) evaluated the performance of eleven search engines, seven global (AlltheWeb, AltaVista, AOL, ASK, Google, MSN, Yahoo) and four Greek (Anazitisis, In.gr, Pathfinder, Robby), with the use of six Greek language queries. He employed thirty-one users who were divided into six groups and each searched one query. Each group member retrieved twenty results. The retrieved results by all group members were evaluated for relevance collectively by the members of each group. Lazarinis reports that the precision of all engines is very similar. Based on the six queries the study further investigated how engines handle upper and lower case input, diacritics, stemming, and stop words. The study noted that there were variations in the handling of Greek.

Moukdad and Cui (2005) investigated how Chinese language queries are handled by Google and AlltheWeb, as well as Sohu and Baidu, the Chinese search engines. They created ten queries by selecting terms from a Chinese-English dictionary. The terms emphasized certain linguistic characteristics of Chinese. The queries were searched in the Simplified Chinese script which is in use in mainland China. The results were evaluated based on the number of retrieved documents, word segmentation, and correct display of Chinese characters. Moukdad and Cui found that the global search engines did not use any linguistic processing and thus were not able to process the Chinese queries satisfactorily, which led to the introduction of unexpected results.

3 The Greek language

Greek is a rich highly inflectional language that dates to 9th century BC. The Greek language uses a different script to that of Latin-based languages. The Greek alphabet set has twenty-four upper case letters, twenty-five lower case letters, and a number of diacritics or accent marks depending on the form used (see Fig. 1).

Fig. 1
figure 1

The Greek language alphabet

The most commonly known forms of the Greek language are ancient or classical Greek, Katharevousa, and Demotic Greek (Dhimotiki) (Babiniotis 1998). Depending on the system of accents used, Greek is either polytonic or monotonic. The polytonic orthography system for Greek uses three accents, two breathings, iota subscripts and diaeresis. The polytonic system was used since the ancient times and was simplified into the monotonic system in 1982. The monotonic Greek language system uses one accent and the diaeresis, in order to signify that two adjacent vowels are pronounced separately and not as a diphthong.

Transliteration of Greek to Latin letters is common but adds to the complexity of processing Greek because of the different transliteration standards. Furthermore, individuals often ignore the standards and apply their own phonetic interpretation. The widespread use of computers and the Internet coupled with the slow progress in adopting non-Latin-based scripts has given rise to Greeklish, which is a form of transliteration used to exchange email messages and post to discussion forums (Karakos 2003; Tzekou et al. 2007).

Alevizos et al. (1988) discuss the challenges faced by search systems in handling Greek. Kalamboukis (1995) introduces the inflectional aspects of Greek and presents a stemming approach.

4 Methodology

The methodology used in carrying out this research is presented in this section. A user need scenario is first introduced. The search engines and the search process are presented. The subject categories selected for the navigational queries follow. A discussion of the evaluation criteria used for each part of the study concludes the section.

4.1 User needs

The use of the Internet by Greeks has seen a threefold increase between 2000 and 2006, jumping from 9.1% to 33.5% respectively (Internet World Statistics 2007a). Similarly, the Greek web has proliferated with an increasing presence of governmental and commercial entities. In 2004 and then again in 2006, most of the Greek web pages (63.5%, and 63.4%) were in the Greek language (Efthimiadis and Castillo 2004). Most Greeks learn a second language to some degree of proficiency; however, it is reasonable to assume that Greeks would search in Greek to find information in the Greek web. Following the Broder (2002) classification of web queries we selected the “navigational” class as the basis of a user task definition. We assume that a user will search to find the specific site of an organization. To that respect our methodology relates to that of Hawking et al. (2001).

4.2 Search engines and the search process

Ten search engines were used in this study. These were divided into two groups, five global or international in scope, and five Greek search engines. The global search engines are: A9, AltaVista, Google, MSN Search (this is not Live Search, as Live was introduced after the study was concluded), and Yahoo!. The Greek engines are: Anazitisis, Ano-Kato, Phantis, Trinity, and Visto. The Appendix lists the engines and the corresponding URLs used to send the search requests.

A program in Java was developed to submit queries to each search engine automatically. The returned results were downloaded and stored in a MySQL database for further processing and analysis. The process is depicted in Fig. 2 and discussed throughout the methodology section.

Fig. 2
figure 2

The search process

4.3 Subject categories and queries

Ten broad subject categories were identified using professional and business directories. The categories are: government departments, universities, colleges, travel agencies, museums, media (TV, radio, newspapers), transportation, and banks. Two hundred and seventeen (217) organizations that had a web presence were selected for searching. For each organization we established the formal name in Greek, its non-Greek equivalent if available (usually in English or other Latin-based language) and the URL(s) of the web site.

The URLs available for these organizations were used to download the corresponding webpage and verify that these were active. In addition, the robots.txt file was checked for every URL in order to establish if there were any indexing restrictions on the page. At that time none of the organizations had any restrictions to search engines for crawling and indexing their pages. Consequently, all search engines should have had access to them.

Queries were generated from the Greek and non-Greek (English or transliterated) versions of the names of the selected businesses or organizations. Table 1 lists the subject categories and the numbers of the Greek organizations that correspond to each category. There were a total of 217 organizations, of which 92 had a corresponding English or other non-Greek equivalent name, thus, resulting in 309 queries.

Table 1 Queries by subject category and language searched

Searches were submitted automatically to the engines in October 2004 and in August 2006. The exact same queries were used in both times. Examples of the queries are given in Table 2, which lists queries and their corresponding subject categories. Both the Greek language and the English or transliterated form of the name is given together with the target URL. As appropriate, there is an indication whether the non-Greek version of an organization’s name is a direct translation to English, a transliteration, a combined form of translation and transliteration, or whether the initials have been used or they have added new words or dropped part of the name. In order to simulate the input of a non-expert searcher the queries were submitted for search in the typical lay-searcher format by typing out the keywords separated by spaces. Advanced search operators and techniques were not used. Since these were “known item” searches, the ideal retrieval would be to get the target URL of that organization ranked first in the result set.

Table 2 Examples of queries used in the evaluation

4.4 Evaluation criteria

This section presents the criteria on which the evaluation was based. As this study aimed at evaluating both the effectiveness and the coverage of search engines in searching the Greek Web, the evaluation criteria are organized accordingly. The criteria used for evaluating the effectiveness of search engines are: (a) qualitative assessment of how the engines handle the Greek language; (b) precision at 10 documents (P@10); (c) mean reciprocal rank (MRR); (d) Navigational Query Discounted Cumulative Gain (NQ-DCG) a new heuristic evaluation measure developed for the study; (e) response time; (f) measuring the ratio of the dead URL links returned. The evaluation criteria for the coverage and freshness of the search engine indices is measured by the presence or absence of a large sample of Greek domain URLs and by the decay observed over the period of the study.

4.4.1 Evaluating search engines effectiveness in searching the Greek web

4.4.1.1 Greek language processing

The Greek language includes a different script than Latin, is highly inflectional, and has variable forms and orthography. To evaluate how engines handle Greek, a set of queries was used that included keywords with and without accents. The results were qualitatively assessed in order to establish whether the engines take these into account.

4.4.1.2 Precision at 10 documents (P@10)

For each of the 309 queries searched the top ten results were retrieved and evaluated. The methodology used for the evaluation includes the rank distribution of the successful results, failure rates, and precision at 10 documents (P@10). Precision at k documents is a well established evaluation measure. It however treats all top k answers equally.

4.4.1.3 Mean reciprocal rank (MRR)

Each search engine is scored using the mean reciprocal rank (MRR) of the target URL (Hawking and Craswell 2002; Voorhees 1999). The reciprocal rank is the inverse of the rank assigned to the correct target URL and then averaged across all queries. Zero is assigned if no correct target URL is found in the top 10 results.

4.4.1.4 Navigational query discounted cumulative gain (NQ-DCG)

The Navigational Query Discounted Cumulative Gain (NQ-DCG) is a new heuristic evaluation measure developed for the study. For every search the top ten results were downloaded and their rank order was recorded. These were then evaluated as to whether the target URL or its variants were found in the results set. For exact or partial matches the rank position was recorded. The measure includes two components, the rank position, and the depth of the page as indicated in the URL. The latter gives some credit for partial matches, assuming that the searcher will be able to identify that the returned result is related to the desired result. This way the search engine is penalized for the additional navigational effort that will be required by the user. This heuristic evaluation measure relates to the discounted cumulative gain (DCG) and normalized discounted cumulative gain (NDCG) (Järvelin and Kekalainen 2000, 2002).

A more formal description of the measure is given below. If m is the number of search engines examined, then each search engine j, (where j = 1,2,…,m) is allocated a score based on the first k results for each query. If the position of the returned result is i (where i = 1,2,…,k) and V ji is the value of the returned result at position i for engine j, then

$$ V_{ji} = k-i + 1 $$

The contributed score (W ji ) of result at position i to the search engine j, is thus calculated as

$$ W_{ji} = \left\{ \begin{aligned} \left( {k - n} \right) * V_{ji} ,\;\;n < k \hfill \\ 0,\;\;n \ge k\; \\ \end{aligned} \right. $$
(1)

where j = 1,2,…,m, and i = 1,2,…,k; where n is the number of subdomains in the returned URL result and n < k. Hence, if no subdomains exist in the returned URL then n = 0. Finally, the total score NQ-DCG j (where j = 1,2,…,m) for each search engine is calculated as:

$$ {\text{NQ-DCG}}_{j} = \sum\limits_{i = 1}^{k} {W_{ji} } \;\;\;i = 1,2, \ldots ,k $$
(2)

For the purposes of this study only the first 10 returned results are considered, k = 10, and the number of engines evaluated is 10 therefore m = 10.

In the examples below the total score assigned to a search engine is calculated based on the above NQ-DCG heuristic evaluation measure.

Example:

Let http://www.ypepth.gr be the target URL for the Ministry of Education (Υπουργείο Εθνικής Παιδείας και Θρησκευμάτων).

  1. (a)

    If the returned result by a search engine, say 7, based on a query is found in the third place (i = 3) and contains only the main page (http://www.ypepyth.gr) then n = 0, following the notation above, V 73 = k – i + 1 = 10 – 3 + 1 = 8, and the contributed score is calculated as W 73 = (k − n) * V 73 = (10 − 0) * 8 = 80.

  2. (b)

    If the returned result by the same search engine based on the query is found in the second place (i = 2) and contains one subdomain (http://www.ypepth.gr/el_ec_category1806.htm) hence n = 1, then based on the above notation, V 71 = k – i + 1 = 10 – 2 + 1 = 9, and the contributed score is calculated as W 71 = (k − n) * V 71 = (10 − 1) * 9 = 81.

  3. (c)

    And, for the URL below returned in the eighth position, then (i = 8) and n = 2 as it contains two subdomains (http://www.ypepth.gr/docs/aitisi_ipotrofion_klirodotimatvn.doc), the contributed score as calculated by the formula above is V 72 = k − i + 1 = 10 − 8 + 1 = 3, yielding a weight of W 72 = (k − n) * V 72 = (10 − 2) * 3 = 24.

Without any loss of generality, assume that the remaining seven results contributed no weight at all (this could happen for example if all contained 10 subdomains).

  1. (d)

    Hence the total weight for search engine number 7 from Eq. 2 is calculated as

$$ {\text{NQ-DCG}}_{7} = \sum\limits_{i = 1}^{k} {W_{7i} } = 81 + 80 + 24 = 185 $$

It must be noted here that the coefficient (k − n) in (1), plays the role of a diminishing or discounting factor by penalizing the results that partially match the target URL and which contain subdomains in the returned URLs. This can be implemented in different ways. For example, as it is implemented here or by introducing a factor such as 1/(1 + n) based on the number of subdomains only. However, in both approaches the result would have been along the same principle of penalizing the presence of subdomains, which in practice resembles to DCG with more emphasis on the presence of subdomains. The proposed NQ-DCG approach has been adopted here as it reflects the presence of the subdomains in a more direct way. Evaluation of these approaches is beyond the scope of the present paper.

The proposed NQ-DCG evaluation measure follows a similar approach to NDCG. For example, NDCG uses a decay factor and measures the gain of each contribution depending on the level of relevance of each returned result. It accumulates the gains by calculating the sum of the gains. It discounts the gain of a returned result that is ranked low so that a highly ranked result will attribute more toward the gain. All these steps are encapsulated in the NQ-DCG evaluation measure.

NQ-DCG measures the gain for each result by discounting its merit depending on how low it has been found based on rank position. This is reflected in the calculation of V ji . Thus, a high score is good, and a low score is not that good in terms of retrieval performance.

The cumulative gain for each search engine is calculated under NQ-DCG j , thus reflecting the similarities between the two measures. However, the major difference between the two schemes is that the proposed measure considers the results in terms of the number of subdomains the returned URL contains. This is calculated by W ji where account is taken of the number of subdomains in the URL in an automated fashion by discounting the relevance of the URL based on its distance from the target. Therefore, NQ-DCG is in principle similar to NDCG and both model better a person’s judgment of a search engine than other measures, like precision at 10 or MRR.

4.4.1.5 Evaluating the returned results: response time

Response time, that is, the time it took from a query submission to the receipt of the results set for each search engine was recorded using the computer’s clock time. Time was collected for all data collection periods and provides a measure for comparing the speed of the search between the ten search engines.

In 2004 the searches were sent from the Athens University of Economics and Business (AUEB) computer lab. The 2006 data collection was conducted at the University of Athens and at the University of Macedonia computer labs. The computers used were desktops with Intel Pentium processors running Windows XP with similar configuration. The network infrastructure is the same, since all the universities use the GRNET/EDET network. Therefore, conditions were very similar per year.

4.4.1.6 Evaluating the returned results: Live versus dead links

The top ten results of each search were recorded and the URLs were extracted. Each URL was then called and its status was recorded in a binary mode as active (live) or non-active (dead) link. No further attempts were made to retrieve inaccessible links, since the average user usually would not persist once a “404 not found” error message is received. The search engines were therefore penalized for returning non-active links (Hawking et al. 2001). This provides an indication of the freshness of each search engine’s index and contributes to users’ cost, because it is associated with user frustration, time wasted, and overall dissatisfaction with the quality of the results and the search engine itself.

4.4.2 Evaluating search engine coverage of the Greek web

To further measure the extent of coverage of the Greek Web (.gr) and the freshness of the index of the search engines we used a sample of 32480 top level domain URLs that were crawled from the Greek Web (.gr) (Efthimiadis and Castillo 2004). These URLs were all active at the time of the first data collection in May 2005. Some were inaccessible either permanently or temporarily during the second data collection in October 2006. These were treated as dead links and excluded from the evaluation in order to avoid penalizing the search engines for not returning them (Hawking et al. 2001).

The 32480 URLs were submitted automatically to the search engines as queries through the developed Java program. The pseudo code of the algorithm is given in Table 3. The query syntax was tailored to each search engine. A similar methodology was used in the evaluation of Google, AltaVista, and AllTheWeb (Vaughan and Thelwall 2004) and Google, Yahoo and Live (Vaughan and Zhang 2007). For example, for AltaVista, A9, Google, MSN, and Yahoo a URL could be searched using “site:www.aueb.gr” which would result in returning a list of pages indexed by the search engine from that particular domain. Although the Greek search engines did not support the “site:, link:, or url:” type of searches, it was possible to search for the URL string and receive results that contained them. The results were then examined to determine whether the target URL was present. If the URL was not found in a search engine’s index it was subsequently called with a HTTP request and the response was noted. If a “HTTP 404: file not found” was returned, then the URL was treated as a dead link, otherwise, the URL was considered to exist but it was not indexed by the search engine. Table 3 summarizes this process.

Table 3 Pseudo code of algorithm for searching the URLs

5 Results

The study results are presented in this section. An overview of the issues surrounding Greek language processing and the effects on searching are given first. The evaluation of the 309 navigational queries, that were searched in the 10 search engines (5 Greek, 5 global), follows. The freshness of the index of a search engine is measured by the percentage of the returned URLs that were live (active) or dead (non-active) links; and the extent of the coverage of the Greek web by the search engines by evaluating if a sample of active URLs from the Greek web appears in their indices.

5.1 Greek language processing by search engines and effects on searching

The way search engines handled the Greek language is presented in Table 4. The table shows whether the engines handled articles, prepositions, pronouns, etc. The table also reports on whether the results of Greek language queries that are submitted to search engines with or without accent marks are the same. For example, a searcher using either keyword “χωριο” or “χωριό” (village) as a query would expect to get the same results because the accent mark does not change the meaning of the word. However, this is not the case as reported in Table 4.

Table 4 How search engines process Greek language input

The five global search engines and one Greek returned different results. The differences observed in the top ten results vary from providing totally different results, to having some small overlap in the results, but with differences in rank order.

It appears that the way Google, MSN, and Yahoo handle Greek is very similar and it amounts to the following. They do not use any Greek processing software, so they don’t do any special segmentation or stemming for Greek. It seems that the default algorithm for Greek or any non-English language is simple white-space delimiting to find words, and indexing of these words minus a universal stop word list. As a minimum, these search engines seem to recognize at least two encodings, Unicode and Windows.

5.2 Search results by rank order and P@10

The 309 navigational queries, 217 in Greek and 92 in English, were submitted to each of the 10 search engines for a total of 3090 searches. Table 5 presents the rank distribution of the results for both the Greek and English queries by search engine for 2004 and 2006. The table lists also the number of organizations missed by each engine, and their success rate measured as precision at 10 (P@10). Of the organizations found it appears that most results were presented in the first three ranks.

Table 5 Rank position of the top ten search results for the 309 queries by search engine

The global search engines have higher success rates for both of the comparison years than the Greek engines. In 2004 the performance of the global engines ranges from 54.04% to 68.61% and in 2006 from 48.54% to 73.79%. The Greek engines range from 10.03% to 58.58% in 2004 and from 10.68% to 52.43% in 2006. Google is the best performing global engine and Trinity is the best Greek engine in both years. However, Trinity is ranked fourth overall in both 2004 and 2006.

Figure 3 shows the overall success rate of the ten search engines. The engines are ranked in descending order based on their performance in 2004. What is also remarkable here, is that the Greek search engines with the exception of Visto, scored the same or slightly worse than what they achieved in 2004. For Anazitisis in particular, the success rate as can be seen in Fig. 3, was more than halved. On the contrary, among the global engines, three did better, whereas for two of them, namely A9 and MSN, the scores are significantly different between 2004 and 2006, especially for A9 where the success rate dropped by almost a quarter. Furthermore, A9 dropped from the third position in 2004 to the fifth in 2006, swapping over with Yahoo. The rest of the engines maintained their positions.

Fig. 3
figure 3

Search engine success rate over all queries, 2004–2006

Table 6 shows the percentage change of relevant results retrieved between 2004 and 2006. Table 7 shows the percentage change of relevant results retrieved on the first rank and the data is graphed in Fig. 4. The range in percentage change is pretty wide from −55.13% for Anazitisis to 21.88% for Visto. Four engines have negative overall change (Table 6). The percentage change is more pronounced in the results from the first rank (Table 7, Fig. 4). Five engines have negative change ranging from −5.61% (MSN) to −62.22% for Anazitisis.

Table 6 Percentage change on overall results
Table 7 Percentage change on first rank results
Fig. 4
figure 4

Percentage change on first rank, 2004–2006

The above results give an overall performance rate for the search engines but do not show how the engines respond to Greek or non-Greek queries. Tables 8 and 9 present the rank distributions of the results by language, Greek and English respectively. In Table 8 it can be seen that AltaVista in 2006 handled Greek queries better than all the other engines with a success rate of 72.81%, while Google follows closely with 70.96%, whereas MSN and A9 are fourth and fifth with 50.60% and 50.23% respectively. The best performance of the Greek engines was recorded by Trinity with 49.30%. The rank distribution of the results from the queries in either English or in transliterated form is given in Table 9. These show mixed results, as we observe variations in performance for almost all the search engines. When compared to the 2006 results from the Greek queries (Table 8), Google has increased its performance (80.43%), Yahoo!’s performance remained about the same (63.04%), whereas MSN, AltaVista, and A9 decreased theirs. Of the Greek search engines Trinity’s performance increased to 59.78%, whereas the performance of all other engines decreased.

Table 8 Rank distribution of results for Greek queries, 2004–2006
Table 9 Rank distribution of results for English queries, 2004–2006

A closer look at Table 8 which depicts how the 10 engines handled Greek queries shows the following interesting results. If conclusions were to be made based on the success rates for both years, clearly the last four places are occupied by four Greek engines with almost identical feeble performance, Visto, Ano-Kato, Anazitisis and Phantis, the worst. For the remaining six engines, Trinity dropped from fourth place to sixth in 2006, A9 from second place to fifth place, Yahoo climbed from fifth to third, while MSN also climbed from sixth to fourth position. AltaVista made an overall first leaving Google in the second place, while Google was a clear winner in 2004. However, and this must also be highlighted, if the analysis were to be based solely on how the engines performed, during both years, on how they scored in returning results in the first rank, the classification ought to have been slightly disturbed. More specifically, Google would have outperformed AltaVista, Trinity would have been two positions higher up in both years, Yahoo would maintain its original positions as for its overall performance, and MSN would have remained unchanged in sixth place for both years. A9 would have dropped one place in 2004 only by a unit difference from the second AltaVista, but it would considerably drop to fifth place in 2006. For the remaining four Greek engines, the classification would be unaltered for both years. These findings suggest that the global engines scored better than their Greek rivals, not only overall, but also at the highest rank.

A similar analysis of the results in Table 9, where the performance of how the 10 engines handled English queries is recorded, shows the following. The ordering is more stable than in Table 8, with the discrepancies now taking place at the lower end of the scores among the Greek engines and towards the middle of the English ones. More specifically, Anazitisis dropped from eighth position to tenth and A9 from third to fifth. For the rest of the engines, the classification remained unaltered with the global ones scoring better yet again against the Greek ones. The only Greek engine that managed to crawl a little bit higher was Trinity. The clear winner in both years was Google yet again. However, should again the analysis be based on the first rank classification, Trinity would have tied in the second place with AltaVista for both 2004 and 2006. The remainder engines would have maintained their initial positions based on the overall success rate.

A further analysis based on second, third, etc., ranking, is pointless as the scores are not statistically sound. Nevertheless, these findings do suggest that Google has performed better than the other engines.

To substantiate the claims made above, an analysis of variance (ANOVA) of the results obtained from the searches was performed. The analysis was performed for the results obtained for 2004 as well as for those of 2006. The statistical package SPSS was used to conduct the analysis. This showed that there is a significant difference at the 100% level in the mean performance of all 10 search engines when the entire sample of all queries, both Greek and English, is taken into account. This is true for both groups of search engines Greek and global.

When the Greek queries were evaluated separately, there was also found a 100% significant difference between the means of the Greek search engines. However, this could not be said for the global search engines when handling Greek queries. Conversely, when the English queries were analyzed, the results showed that the global engines showed a 97% significant difference in their mean performance, whereas this could not be substantiated with confidence for the Greek engines. The situation when all queries are considered together is similar to the latter finding. The main reason for such a situation, is accredited to the excess zero entries in the ranks between fourth and tenth position scored by the Greek search engines when handling English queries.

Tests run on the paired differences show that in some cases it can be argued with certainty that some engines performed worse than all others. For example, Anazitisis for all search results and for Greek queries, A9 and MSN for all search results and MSN for English queries. The engines that performed better among the other groups are Google, and Trinity for all search results, Trinity, AltaVista and Google for Greek and finally Google and Trinity for English.

It must be mentioned here that the ANOVA tests run on the 2004 and 2006 data sets, showed the same behavior for both these groups. This suggests that the engines behaved in a similar manner during both periods of the study, 2004 and 2006.

Moreover, what can be confidently argued is that after having run t-student tests for paired comparisons to the results of 2004 and 2006 by rank and category, i.e. 2004 Greek engines-Greek queries against 2006 Greek engines-Greek queries, the samples are statistically similar showing that the means of the samples are the same. This holds for all searches for both Greek and/or English queries and for both groups of engines.

Hence it can be said that the different runs during 2004 and 2006 showed an overall similar behavior on the part of the search ability of the engines.

5.3 Mean reciprocal rank

The mean reciprocal rank (MRR) for all searches that were presented in Sect. 5.2 above was calculated and is presented in Table 10. The data in the table are sorted using the MRR results for all queries in 2006. The analysis of the data showed that for all queries (both Greek and English) Google, AltaVista, and Yahoo had increases in MRR performance from 2004 to 2006. However, Google was the only search engine that had an increase in MRR for both Greek and English queries between 2004 and 2006. All other engines either remained the same or had worse performance. Yahoo, AltaVista and Visto had some increases in the MRR performance of the Greek queries, but a drop for English queries for the same period.

Table 10 Mean reciprocal rank for Greek and English/Latin queries in 2004 and 2006

5.4 Search results by subject category and NQ-DCG

Using the Navigational Query Discounted Cumulative Gain (NQ-DCG) method discussed in the evaluation criteria Sect. 4.4.1.4 all queries were scored and then grouped by category. This enables a finer evaluation of the performance of the search engines in the study. Table 11 shows the results of this evaluation grouped by language and by subject category for 2004 and 2006. Based on the scoring the larger the number the better the retrieval performance of a search engine. Google from the global engines and Trinity from the Greek engines outperformed the other engines in their respective groups. But, this is not to say that Trinity’s performance is good. On the contrary, when comparing the Greek and global engines the Greek engines failed miserably.

Table 11 Sum of the scores of the top ten results by subject category, language, and search engine

Based on the aggregate results for all search engines per category for Greek queries the coverage of the categories is in the following rank order: travel agencies, universities, banks, government departments, newspapers, colleges (TEI), radio stations, museums, transportation & communication services, TV stations.

Similarly, the aggregate results for all search engines for English queries show that the rank order of the coverage of the categories is: universities, newspapers, banks, government departments, colleges, transportation & communication services, travel agents, radio stations, and TV stations. Travel agencies category has the most variation in rank amongst Greek and English, positions 1 and 7 respectively. Newspapers also ranged from rank 5 for Greek queries to rank 2 for English queries.

The statistical analysis of variance (ANOVA) of the results by subject category in the Greek queries (Table 11) shows a 100% significant difference in the mean performance of all engines, whereas in the English queries the difference is at the 95% level. Again here the ANOVA tests carried out on both samples for the years 2004 and 2006 showed an overall similar behavior on the part of the search ability of the engines when the subject category classification was considered.

The only discrepancy is found in the results of two engines, Anazitisis, for both Greek and English queries, and A9, for English queries only. At the per category classification of the searches (Table 11) both Anazitisis and A9 didn’t pass the comparative tests, although they did pass the tests with respect to the rank classification. The reason being as can be seen from the results (Table 11 and also in Fig. 3), that the 2006 results are much worse for Anazitisis and A9, meaning that these two engines performed better in 2004 than in 2006.

5.5 Live versus dead links

For each of the ten search engines the 309 queries could generate up to 3090 results. These results were further evaluated by measuring the percentage of live (active) versus dead (non-active) links. Such evaluation measures the freshness of the index of the search engine and is an indicator for the levels of frustration the searcher would have to undergo. This is an additional way of evaluating the precision of the search and the cost to the user should they follow the dead links.

The results presented in Table 12 show the aggregate number of returned URLs for all 309 queries that were submitted to each search engine. These are further divided into those that were active (live) and those that were non-active (dead). In 2006, of the global search engines A9 had the highest percentage number of active links (96.78%), whereas Yahoo the highest percentage number of dead links (8.65%). Of the Greek search engines, Trinity had the highest percentage of active links (94.49%), whereas Ano-Kato the highest percentage of dead links (27%).

Table 12 Active versus dead links for all queries

These results despite being a good indication of the freshness of the index should not be considered in isolation. When compared to Table 5 which shows the overall success rate of the search engines it can be seen that for example, A9 has poor precision results successfully retrieving only 49.19% of the correct answers, while the dead links found in these results are 3.22%. Google, on the other hand, retrieves 73.79% of the correct results, while the non-active links are 4%. From a user’s point of view getting higher precision in the top ten results is probably more valuable.

The performance of the Greek search engines with respect to active links in their result sets is disappointing. The dead links for all engines but Trinity (5.51%) range from 13.05% to 27%. Trinity is the best performing Greek search engine.

Figure 5 presents the cumulated results for the active (live) and dead links found in the result sets of the Greek and English queries. From 2004 to 2006 for the results for Greek queries there is a 4.88% percent increase in the live links, whereas the increase in dead links is 25.11%. For the same period for the results for the English queries there is a drop in active links by −2.66% and a dramatic increase in the dead links (44.6%).

Fig. 5
figure 5

Live versus dead links, 2004–2006

5.6 Response time

The response time data found in Table 13 and Fig. 6 show an increase in speed from 2004 to 2006. Most search engines improved response time with percentage changes ranging from −0.40 to −0.95. The highest increase in speed is seen in MSN, which was the slowest engine in 2004. Three engines, two Greek (Phantis and Visto) and one global (A9) decreased their response time with a percent change of 0.59 to 0.99. The overall faster response times could be attributed to hardware upgrades that were implemented at the Greek universities during that period. For example, better computers at the university labs, and the GRNET network upgrade both in the universities and in the backbone infrastructure (1–2,5 Gbps). Similarly, hardware upgrades at the search engines could have also contributed to the faster response times.

Table 13 Average response time over all searches per SE in seconds
Fig. 6
figure 6

Average response time by engine, 2004–2006

5.7 Search engine coverage of the Greek web

To measure the extent of coverage of the Greek Web (.gr) and the freshness of the index of the search engines a sample of 32480 top level domain URLs that were crawled from the Greek Web was used (see Sect. 4.4.2). This sample was estimated to be about 40% of the registered.gr domains in 2004 (Efthimiadis and Castillo 2004). Table 14 and Fig. 7 present the results by year, search engine, and actual numbers and percentages of indexed and not-indexed URLs as well as the URLs that were dead in 2006. Table 15 shows the percentage change between 2005 and 2006 for the indexed and not-indexed URLs.

Table 14 Indexed versus not-indexed URLs, 2005–2006
Fig. 7
figure 7

Indexed versus non-Indexed URLs, 2005–2006

Table 15 Percent change for indexed and not-indexed URLs, 2005–2006

In 2005, Google with 98.04% has almost all the URLs indexed. Yahoo follows with 84.06%, A9 with 77.41% and MSN with 66.63%. Anazitisis with 61.72% is at the top of the Greek search engines, while Phantis with having indexed a dismal 4.87% at the bottom of the list. Visto is not included in 2005 because the data was corrupted. For 2006, we observe a drop of the indexed URLs across all search engines but Trinity and Phantis.

In Table 14 the data for 2006 take into consideration the decay of the URLs searched in 2005. Although the initial sample contained 32480 URLs, the table in the “URLs checked” column reports fewer numbers of URLs per engine. This is due to network problems during searching. It was therefore decided to include only the successful returns because this way a more accurate picture is reported. The “URLs not-indexed by SE and dead” were subtracted from the total number of “URLs checked” in order to avoid penalizing search engines that did not index them. Since only URLs that were not found as indexed by a search engine were checked to verify if these were live, a number of dead URLs that were still in the search engines’ indices were accepted without penalizing the search engines per Hawking (2001). In 2006, Google maintained its lead over all other search engines however it dropped its coverage to 91.37% of the sampled URLs. A9 also dropped to 72.75% while Yahoo and AltaVista, which use the same index, increased their coverage to 86.2% and MSN to 71.61%.

The bar charts in Fig. 7 provide a visual representation of the indexed versus the not-indexed results from the search engines. The bars are stacked showing the 2005 results left of the middle gridline and the 2006 results right of it. In both 2005 and 2006 Google seems to have the best coverage of Greek URLs followed by Yahoo, AltaVista, A9, and MSN. The Greek search engines have rather different order in the two years. In 2005 the best coverage was provided by Anazitisis, and followed by Ano-Kato, Trinity and Phantis. In 2006 Trinity had the best coverage and was followed by Ano-Kato, Anazitisis, Visto, and Phantis.

A closer examination of the results, especially at the percentage change between 2005 and 2006 of the indexed and not-indexed URLs (Table 15) reveals that Google, A9, and Anazitisis had the biggest losses in coverage. Google had dropped from its index about four times the URLs it did not have in 2005, but still has the best coverage of all engines. Yahoo, AltaVista and MSN improved their coverage at about the same rate. The most noticeable improvements in coverage came from Trinity and Ano-Kato albeit still performing below that the global engines.

6 Conclusions

This study aimed at evaluating how search engines handle Greek language queries, assessing whether the Greek or global search engines are more effective in satisfying user requests for navigational queries, and, evaluating the extent of coverage of the Greek web and the freshness of their indices. The study evaluated ten search engines, five Greek and five global. Our results corroborate and extend the findings of (Lazarinis 2007). The analysis shows that the global search engines ignore the characteristics of the Greek language, hence treating Greek queries differently. Despite this finding the performance of the global search engines outperforms that of the Greek engines in both years of the evaluation, 2004 and 2006. A set of 309 navigational queries was used in the evaluation. The rank distribution of all search results indicates that on average the search engines retrieved the relevant target URL in the first three rank positions. However, the rate of success leaves much to be desired as the most successful engine, Google, was able to find the correct answer to only 73.91% of the English and 60.37% of the Greek queries. The global engines seem to have good coverage of the Greek web relative to the sample of 32480 URLs tested, but the results returned by the engines are different depending on how the searcher has typed the Greek query, e.g., with or without accents.

Therefore, the implications for Greek users are many as they need to be aware of the nuances to searching using Greek. The study was conducted during different periods of time, in 2004 and 2006 for the navigational queries, and in 2005 and 2006 for the indexing coverage of the sample of 32480 URLs. The coverage of the URLs for 2006 ranged from as low as 12.6% for Phantis to as high as 91.37% for Google. Although, Google’s coverage seems high, it has dropped from 2005.

The results obtained, were statistically analyzed to substantiate that the sample means of the search outcomes per engine were different. Also, it was statistically justified that the behavior of the engines was similar for the different years.

A possible explanation of the poor performance of the Greek search engines might be the lack of sophisticated crawling, searching, and ranking algorithms found in the global search engines. The Greek search engines have a very low coverage of the Greek web (see Table 14), where it ranges from 4.87%–61.72%.

Although the global search engines outperformed the Greek engines, there is much room for improving their performance in both retrieval effectiveness and coverage. Given the better performance of the global engines as reported in this study, it could be expected that if the global search engines were to take into account Greek language characteristics their performance would be further ameliorated.