1 Introduction

Any religion in the word has a number of guides and directives to be followed by its believers, and Islam has no exception. The Islamic laws are derived from the legislative sources which are: Quran (the holy book), Hadith (the words and acts of the Prophet Mohamed peace be upon him), Ijtihad (consensus of the companion) and Quias (the analogical deduction).

Quran is the main religious source of the Islamic rulings or jurisprudence. One of its miracles is its unique style written in classic Arabic. Linguistically and in terms of perfection; it is considered the best Arabic scripture with its expressions and words, in that great meanings were expressed in two or three words and sometimes the same meaning was rehearsed in different and various ways. This holy book contains a huge amount of knowledge in various subjects like: the pillars of Islam, faith, general and political relations, science and art, holy Quran, organising of financial relations, human and social relations, Al-Jihad, religions, judicial relations, working, stories and history, human and ethical relations, trade, agriculture and industry and call for Allah (Daawah).

Lately, Islamic data knew a huge explosion with the growth of Information technology. One of the most used sources of information to search the Islamic knowledge is the web. However, this knowledge is sparsely spread all over the web and it is not well organized which make it hard to access, process or reuse it.

Moreover, the majority of software and websites for searching on the Islamic texts are keyword based. And the absence of the semantic support affects the precision of the results, especially when dealing with the ambiguity of the Qur’anic text and the Arabic language in general.

As most Islamic resources are represented as a plain text or image documents, achieving machine interoperability was a major problem. After the introduction of the semantic web, Ontologies were very popular in different communities such as knowledge management, natural language processing, and information retrieval. Ontologies were defined for the first time in computer science by Gruber “an explicit specification of a conceptualization” [1]. As an effect, a lot of works have been done to take advantage of the ontologies for a machine-readable representation to semantically model the Islamic domain.

This paper is laid out as follows: Sect. 2 gives background information on Arabic and Qur’anic texts difficulties. Section 3 is dedicated to outlining the different main sources of the Islamic religion. Section 4 gives a brief overview of some relevant works on Islamic texts. Section 5 briefly introduces our in-progress project. Section 6 presents our recent results. Section 7 shows some possible future works. Finally, we conclude this paper within the Sect. 8.

2 Arabic Linguistics

The Arabic script-based languages share in different degrees an explosion of homograph and word sense ambiguity. Dealing with such a problem represents a real challenge to NLP systems. Resolving ambiguity in NLP requires representation not only of linguistic and contextual knowledge but also of domain knowledge. Ambiguity in Arabic is enormous at every level: lexical, morphological and syntactic. Another serious problem is tokenization and it is extremely common in Arabic to find a token that can function as an entire sentence in English (Fig. 1).

In order to facilitate the comprehension of the Islamic sources structure, the Arabic grammar rules are used to define the meaning which is very important since no decisions can be deduced except when the content of these sources are well understood [2].

It is commonly known that Arabic is one of the most difficult languages. In fact, each language has its problems and limitations. In Arabic for instance, it can be the agglutination because as Arabic native speakers we are able to read any text automatically without any agglutination signs, but it can be more challenging for automatic processing systems or non native-speakers as shows in the Fig. 2. At the other hand, Arabic language has a very strict grammar rules which can be helpful in limiting the problems of the automatic processing of Arabic texts. So the problem in this case is the lake of research and works done on the language rather than the difficulty of the language itself. For exemple, Arabic is spoken by more than 300 million people in over 22 countries, but the works made regarding the automatic processing of Arabic or ontologies are almost non-existent, and a big part of these works are very limited especially compared to the evolutions of other languages. Among these works we can quote the works presented in [3,4,5,6,7,8].

Fig. 1.
figure 1

Example of the Arabic language difficulties: the tokenization problem.

Fig. 2.
figure 2

Examples of the Arabic language difficulties: the problem of agglutination.

3 Arabic Legislative Sources

The four legislative sources: the holy Qur’an, Sunnah of the prophet, consensus of the companion (Ijama) and the analogical deduction (Quias) (presented in the Fig. 3). Also, we have Fiqh which is the science of having the knowledge of decisions of all Islamic laws which are extracted from the four Islamic legislative sources. In the other hand, we have the foundation of jurisprudence (Usul EL-Fiqh) which it is the theoretical bases relating to the methodology which contains indications and methods used to extract Islamic judgments from the four Islamic legislative sources (Fig. 3).

Fig. 3.
figure 3

The Islamic legislative sources.

4 Islamic Ontologies

4.1 Quranic Ontologies

Many applications based on Quranic text have been built to facilitate information retrieval and knowledge sharing. Some works used the Qur’an in its original standard Arabic format; others used a translated version like English, Malay... etc. Following are some of the recent studies:

Mustapha [9] proposed a dialogue-based visualization system called AQILAH to facilitate navigating and learning the Qur’anic text. The source of knowledgebase used was an English version of the Qur’anic text. This prototype was able to answer user’s query by listing the related verses.

Fouzi et al. [10] based their work on statistic and linguistic methods. They applied a linguistic pattern-based approach to extract the concepts and association rules to extract the conceptual relationships from the Qur’anic text related to the historical stories of the prophets.

Al-Yahya et al. [11] worked on designing and implementing an ontological model capable of representing the Arabic language lexicons. The application was applied on “Time” vocabulary in the Holy Qur’an. This ontology only contains 18 concepts where the temporal sequencing of time cannot be captured.

Aliyu et al. in [12, 13] proposed a framework capable of responding to complex natural language queries related to historical concepts mentioned in the Qur’an. They based their work on a knowledgebase containing an annotated ontology in RDF, where user’s queries are reformulated to match the knowledgebase representation for concept retrieving.

Abbas [14] proposed a bilingual search website for the abstract and concrete concepts in the Qur’anic text. This website is based on a syntactic search using keywords in the user’s query to retrieve the answer according to their occurrence in verses. The limit of such systems is the use of “keyword-based algorithms”, and thus they do not provide any semantic or contextual information. To enhance the results she used eight English translations of the Qur’an and extended the search by using lemmas rather than the exact words of the queries.

Lamraoui [15] proposed a research model incorporating Dukes’ ontology in various levels of the information search system. He integrated the ontology to represent the Qur’an and to reformulate the user’s queries.

Zaidi in [16] worked on the implementation of a new process for building ontologies from Arabic texts, and its application on the Qur’an. She proposed a hybrid method for extracting simple and complex terms, as well as the semantic relations. She used the “The Quranic Arabic Corpus” proposed by Dukes and the “Al-Sulaiti corpus” proposed by Eric Atwell.

Sharaf et al. [17] designed a new knowledge representation for the Qur’anic text. They were able to propose a FrameNet frames for the Arabic verbs mentioned in the Qur’an. To manage the ambiguity of some cases, several English translations of Qur’an as well as books of Tafsir were used.

Among the prominent works on Qur’an, we find the works of Dukes, who created a morphological and syntactic annotation for the Qur’an [18]. This project was conducted at Leeds University and their website is world-wide collaboration. Also, he proposed a Qur’anic ontology in [19] that uses knowledge representation to define the key concepts in the Qur’an, and shows the relationships between these concepts using predicate logic.

Saad and Al [20,21,22], which uses an approach based on a combination of NLP techniques, information extraction and text meaning technologies, to create an ontology for Islamic concepts in the Qur’an. But the work presented in [20]; covers only 63 verses related to the obligatory prayers.

Hakkoum and Raghay [23], their ontology covers the following subjects: Qur’anic chapters and verses, each word of the Qur’an and its root and lemma to facilitate the keyword search, it does not cover words morphology search but they stated that they will add links to QVOC ontology later on.

4.2 Hadith Ontologies

Abdelhamid et al. [24] developed a tool that helps in compiling all the authentic Hadith from the Malay translation of the six books containing the authentic collection of Hadith text (Bukhari, Muslim, Abu Dawud, Tirmidhi, Nasai, Ibn Madja). The final tool was a well-structured relational database with a user interface.

Yehya et al. [25] proposed a decision support system to judge Hadith Isnad using Ontologies. Their work is based on the methodology used by Hadith scholars.

Mohammed [26] proposed an Ontology-based approach to enhance the process of information retrieval from Al-Shamelah digital library. This work presents a method to support semantic search with complex queries by proposing a new ontology to model concepts from Al-Shamela digital library (ADL). For the evaluation process, they compared the results obtained from their system to the results obtained by the ADL. This system was applied to Hadiths covering the Prophetic medicine domain presented in the ADL.

4.3 Islamic Legal Rulings Ontologies

The Islamic legal ruling represents the divine law revealed in the Quran and the Sunnah and developed by the consensus of companions (Ijma) and the analogical deduction (Quias). Despite the importance of the Islamic legal rulings, they are not semantically represented. As to our knowledge, no work has been done in this area.

5 Our Project

Our project is to develop a new system which aims to extract Islamic judgments with the related evidence texts from Islamic legislative sources. The users can ask any question that requires a deep reasoning using complex natural language. The final application could be used by Muslims, non-Muslims and by the decision-makers in the field of El-Fatwa too.

This work is based on Ontologies representing the Islamic knowledge scattered on the four legislative sources. Nevertheless, the knowledge modeling techniques from an Arabic corpus and the technical analysis of knowledge contained in ontologies are sparsely studied, which requires a deep epistemological research. Even more, the application of such studies on Qur’anic texts is very limited. A new work in this area certainly brings benefits to the Islamic world and the Arabic world in general, as well as a huge support for the progress of the modern science in all its fields. The purpose of this project is to present an interdisciplinary approach, which allows us to correctly read, understand and interpret the Islamic legislative sources.

The work of our project is divided into two major axes:

  • The first axe consists of the construction of the ontological model representing the four legislative sources of Islam. Diverse problems are put at this stage, but since many works (some presented in this paper) are done in this domain, they give us the methodological frames for this process. During this phase we create a knowledge base combining different available Qur’anic ontologies like the Ontology developed at Leeds University. Whereas for the rest of the legislative texts we aim to create our own ontological representation.

  • The second axe consists of the analysis phase of the ontology built to supply an answer to a given question. Indeed, the absence of a system which can analyze on ontology and supply a result leads to design a complementary tool for Protégé 3.2 to reach the aimed goal.

The final result of this project is a dialogue-based system where users can enquire the system in Arabic. Where needed, the system can ask the user for more details about its query and at the end of the dialogue, the system will generate an answer containing the Islamic judgment with the related verses and evidence texts.

6 Results

At is stage we aim collecting different existing ontologies to better describe the Islamic knowledge. Until now, we have collected different ontologies and we quote: The Qur’anic annotation of Dukes [19], The Solat Ontology of Saad et al. [20], the domain Ontology of Hakkoum [23] and our Hadith Ontology.

6.1 Our Ontological Model

The automatic processing of texts in Arabic is not very fruitful, due to the complexity of the Arabic language and the lack of tools that allow proper treatment of the language. This has led us to the manual construction of our ontology. By analyzing the texts of the corpus of the Hadith and with the help of the most relevant terms extracted with RapidMiner, we were able to conceive the dictionary of concepts. Then we extracted the different relationships between these concepts, which allowed us to conceive the conceptual model of our ontology. And finally, we have implemented this ontology with the ontology editor Protégé (Fig. 4), which we evaluated with some SPARQL queries (Fig. 5).

For our ontology we used Volume 1 of “Sahih Al-Bukhari” which contains the following books: book of Revelation, The belief, Ablutions, Menstrual periods, Prayers, The times of prayer, etc. We implemented our ontology with Protégé.

Fig. 4.
figure 4

A sample of the hadith ontology graph in Protégé.

Fig. 5.
figure 5

Result of Sparql query for concepts subsumed by

7 Future Work

  • In Quran or Islamic sources in general, we can find different words or phrases used to refer to the same thing. For this reason, it is preferable to use a large collection of synonyms to help understanding the meaning and avoiding the ambiguity.

  • Creating Ontology for Tafsir to facilitate the comprehension of the Islamic knowledge.

  • Creating a domain ontologies for the Islamic knowledge

  • Using Protégé with other programming languages and tools to enhance the results.

8 Conclusion

In this paper, we presented some works previously made on the Islamic texts. The extensive efforts made in this field were toward ontology building process. Even though the majority of these works were done on Qur’anic text only and they remain very limited, they either focus on certain Surah or verses related to a given topic with some simple queries or struggle dealing with the ambiguity of natural language.

To summarize, there is a huge need to create domain ontologies for the Islamic legislative sources rather than focusing our efforts and time on sample ontologies. Moreover, presenting a semantic representation of the Islamic rulings is a critical need to guaranty the correct understanding the Islamic religion.