Keywords

1 Background

Big data is hot word in today’s society of network. People from ordinary to the whole country enjoy the convenience of big data in varying degrees. In medical services, retail, finance, manufacturing, logistics, telecommunications and other industries, the research and application of big data has started, and has created a huge social value. Government departments also attach great importance to large data technology. In March 2012, Obama announced the U.S. Government to invest 2 billion dollar for the big data research and development program. Our government has also realized the huge potential of big data and is to make policy to promote the development and application of big data.

With application of information acquisition, mass data storage, high-speed data transmission and intelligent data analysis, it has brought a far-reaching impact on the contemporary agricultural scientific research. Chinese Academy of Agricultural Sciences (hereinafter referred to as CAAS), as the most authoritative scientific research institution in the country, have already felt the great influence on big data. In 2014, CAAS draw up “China Academy of Agricultural Sciences information development planning (2015–2019)” (hereinafter referred to as the planning). The planning put forward that the main task of is to build e-Science platform, promoteFootnote 1 information of scientific research conditions, management and scientific research output and enhance capacity for the scientific data and resources acquisition, protection and utilization. The acquisition and utilization of scientific research data impartment. At the same time, the planning points out the problems in the construction of data.

2 Current Situation and Problems

CAAS Web Portal included 10 functional departments websites and 41 research institutes websites, have two major functions of department office and service research, and it is an importance display window for showing scientific research strength and service agriculture. Since it was on the line in 1997, website has played an importment role for science research and management. Now, CAAS portal has included 51 institutes secondary network. But according to statistics, total websites in CAAS has arrived 236. The specific site condition survey results are shown in Table 1.

Table 1. Number of existing sites and views

Construction of Web database has begun in the 1980’s and it has arrived 185 as of 2014. Here are several examples in early phase. In the 1980s, the database of feed nutrition had been jointly build by Institute of Animal Sciences and the Computing Center head by Academician Ziyi Zhang. In the 1990s, seed resources database built by professor Xian Zhen Zhang has reached 42 million types by 2014. The agricultural science and technology data platform information presided over of a few years ago by professor Xianxue Meng had been put into application in which involves 12 individual sciences, 72 subject database groups, 672 databases and 1987 TB data quantity browsed by 50–60 million people every day. Seed resources database in China has accumulated hundreds of millions of seed resource data. With development of genetics and breeding subject the gene sequence of important crops and animals measured precisely have formed a huge gene database. There are many All kinds of scale national construction scientific data and resource data center.

But, it has a weak ability in data collection and integration. There has only 14 data websites in the CAAS portal. According to statistics in 2014, the other 185 data resources websites are independent with the CAAS portal. All of these data resources cann’t share and analysis. Web site data storage format and location are shown in Table 2.

Table 2. Web site data storage format and location

2.1 Network Is Weak

2.1.1 Lack of Overall Website Construction Planning and Financial Support

Since the CAAS portal has been setup, website establishment, development and daily maintenance run by each institutes. Because there is no unified planning and special funds to support, website construction and service level is uneven and database construction and maintenance status are poor. Nothing of unified management and data sharing platform it is difficult to data sharing and analysis in the future.

According to statistics, nearly 2/5 of the institutes do not have the operation and maintenance expenses. Compared with the Chinese academy of sciences, CAAS funds in development of information is far smaller. CAS information construction began in the “fifth” period. By the end of “Thirteen Five”, it has accumulated invested several million dollars.

2.1.2 Network Infrastructure Environment Is Weak

According to statistics, Chinese Academy of Agricultural Sciences 41 research institutes, 10 departments in 2014 generated approximately 10 TB of data per day, and 10 % annual growth rate, while the storage capacity of the existing network center room of the CAAS can not meet demand. All of the institutes in CAAS distributed 24 provinces in the country. Since there is no interconnection of data transmission lines, scientific research data between the institute, the base, the test station and field station can not be shared and transmitted.

2.2 Lack of Unified Planning of Data Resource Construction

Data resource construction is a long-term work and need a reasonable system to ensure data accuracy and availability. Lacking of unified standards data need to repeat collection, repeat input and maintained by multiple departments. At it caused information update asynchronously and non unified. The specific problems in following several aspects:

First, there is not a unified data format and compatible standards. According to the survey, all of 42 institutes have built their own financial and personnel platform, but because CAAS did not establish a unified system compatibility standards, data in departments with another department. As the amount of data is growing rapidly, it will increase the workload of management and work harder.

Secondly, CAAS lack research management platform, such as platform for research software and scientific instruments sharing. Instrument and software is an important part of the use of scientific research funds. Because CAAS have not scientific research software and instrument unified purchase plan and sharing mechanism it caused software and instrument purchased repeatedly and utilized lowly.

Third, there is a lack of scientific research data storage mechanism. Most of the scientific research data edit and save by each group, lacking of system and professional. According to statistics in 2014, 115 research institutes or units has built its own web site that one institute have about 3.3 of web sites on the average. But it is lack of unified management and effective utilization, scientific research data is low usage rate.

2.3 Lowly Data Processing Capacity

With the requirements of computing resources and storage resources continue to improve, high performance data processing platform is necessary condition to deal with data resources. The development and use of high performance computing in 80 years of the last century has already begun. National “973” and “863” programs had put on a lot of funds in research of high performance computing and it had applied many fields such as defense and security, oil exploration, weather forecasting, bioinformatics, gene and nanotechnology and other aspects. In “fifteen” period, CAS has made remarkable achievements in high-performance computing environment and had developed domestic-made supercomputer Shuguang 2000-II and Lenovo Pentium 6800 supercomputer. But CAAS has not its own high-performance computing center and all high-performance computing demands are dependent on other societies. Since the high-performance computing research in different directions have different software and hardware requirements and highly specialized, the results computed by outside computing center cannot reach the preset requirements, and data security can’t be protected. The survey in 2014 showed that more than 2/3 institutes particularly animal husbandry, biology, vegetables and flowers, have used high performance data processing, while other institutes also expressed requirement of high performance data processing.

2.4 Website Services Model Is Single

In the current era of rapid development of the Internet, the ways people access information and communication are also diverse, new media such as Twitter and micro-channel has become an important way of communicating information. And smart phones, tablet PCs and other mobile Internet terminals grow explosively. As of mid-June 2013, China’s mobile phone users reached 464 million. Mobile office has become a more relaxed and effective way of working in the future. But CAAS portal website only has a traditional Internet platform and is lack of function to diverse and release data resources.

3 Suggestions of Web Data Resources Construction in the Future

The planning in 2014 proposed that information technology development goals in the future will be around the overall goal of “build a world-class agricultural research institutes,” and build domestic first-class agricultural research institutions in information service platform and to provide first-class information technology services for agricultural research through the integration and sharing of agricultural research data.

3.1 Improve System of Data Resources Construction

Construction of the database is a long-term project. Ensuring the smooth development of construction of the database, we need to establish clear rules and regulations in the fund, personnel and data collection, storage and so on.

3.1.1 Have a Realistic Implementation Plan and Funding

The planning in 2014 has listed a timetable implementation for specific information construction which including planning storage and computing platforms and data resources construction funds needed. Program is divided into two phases, the first five years will focus on the construction of network infrastructure, and the second year will deepen the service and data mining. The first task of a five-year plan have been identified, the specific implementation plan Table 3.

Table 3. The planning timetable from 2015 to 2020

3.1.2 Construction of the Data Resources Needs Full-Time Institutions and Technical Personnel

We should establish academy-level information technology allied agencies and improve the information technology systems of each institute. Currently, information construction in CAAS is responsible of Agricultural Information Institute. Only two related offices including 25 employees are charge of network maintenance and site updates. Other information technology personnel is even more lacking. In the planning, CAAS will established a-hundred-person team of network operation and maintenance and website editors, and set policies to improve staffing and operational system of information technology, and regular train and assessment technical personnel. Hence we need to establish comprehensive information centralized management and to format a clear division working mechanism for each institute. And on this basis we would explore to establish the academy co-ordination, each institute the participation, cooperation and win-win management and service models.

3.2 To Improve Network Infrastructure Construction Environment

Network infrastructure is a prerequisite to protect the data storage and computing preconditions. In order to meet the data resource construction of network infrastructure environment we should know the needs of each network. First, it is necessary to clearly know network requirement and to overall layout design network function, structure and layout. Meanwhile we should improve power system, monitoring systems, fire systems, refrigeration systems and other facilities. The Planning put forward that CAAS will be completed in the network infrastructure to meet the next decade, the development of information technology within five years (Table 4); Secondly, we should construct high-speed data transmission channels, improve network transmission environment and establish a without barriers data sharing and transmission line to link 42 institutes distributed in the 20 provinces.

Table 4. 2015–2020 the central office-based environment construction plan

3.3 To Build Network Platform of the Development for Data Resources

Data resources are one of the main research achievements of scientific research. According to research firm IDC predicts that the world’s raw data storage capacity will be an annual increase of more than 50 % by weight, and all of data is not only growing in volume but the growing complexity of data resources. Therefore, the establishment of massive data collection, storage platforms and computing center will be a necessary condition for the era of big data research institutes and development.

3.3.1 Construction Big Data Cloud Storage Center

Big data analysis and research has been carried out and quickly deployed in a variety of different research areas, such as genomics, proteomics in particular, its data growth rate will exceed the legendary speed IT design development, because data storage capacity and data processing capacity of the existing data center can’t meet the future needs of scientific computing and analysis. Survey about information technology showed in Fig. 1: This indicates that in the case of rapid growth of data the pressure storage capacity is the biggest problem of the network construction. Therefore, in order to meet the data storage demands of big data era, building cloud storage center is to meet the necessary conditions for the future of store large data.

Fig. 1.
figure 1

Chinese Academy of Agricultural Sciences Research Network Storage Problems table

The planning proposed to establish a data cloud storage platform for information integration and islands of information and proposed to provide long-term data preservation service, remote backup service and online storage service associated monitoring and surveillance data.

3.3.2 Construction High Performance Computing Data Center

In contemporary agricultural science research, since scientific data surge that the possibility of obtaining more and more science depends on acquiring, processing a sufficient amount of data capacity. Establishment of high-performance computing data center can provide genomics, proteomics, bioinformatics, new materials and other high-performance computing services.

CAAS have a strong demand for high-performance computing. According to 2014 survey results, more than 2/3 of the institutes is applying high performance computing, 1/3 of the institutes also has the needs of high performance computing. Along with the agricultural development of information technology, agricultural research is becoming comprehensive and interactive. It is urgent to the needs of multi-field, regional, cooperation in collaborative research teams. All types information collection and data mining and analysis has become the main direction of agricultural research and development. Because the database has accumulated a massive data of flora and fauna resources, monitoring and sensing data, network data, traditional data mining massive data mining model can’t meet the demand for computing power, thus it is the need to establish a new data mining models with high-performance computing capabilities.

Building high-performance computing centers should combine with the actual needs of research work. By the research institutes requirement of the high-performance computing application, we can build a high-performance computing platform including a scientific-basis parallel computing platform software platform, system application software platform and tools.

Build high-performance computing centers can also provide remote sensing data, sensor data, network data and other large data analysis and processing business for domestic and international users. Building cloud computing platform will achieve effective dispersion mining resources, integration, sharing.

3.3.3 Construction of the Data Collection and Processing Center Resources

Website statistics show that in 2015, all of the 236 sites, 227 sites generated data used the database management. Data collection and processing center construction can rely on the network resource data platform to meet the decentralized or centralized storage of agricultural natural resources and science and data resources to achieve centralized remote integration and data sharing agricultural scientific data resources. Construction of Agricultural Sciences data integration center, we can improve the efficiency of data query services, protect the security of the data.

For experimental data (experimental) chamber of agriculture science, agricultural science field station and observation station generated, we can deploy data collection layer, field control layer, data storage layer and business application layer architecture system, establish a wireless sensor network system, the completion of agricultural production monitoring network laying demand, eventually things will transmit information to the data collection center.

Ultimately, by constructing a data center networking platform we can provide experimental (test) data network platform for intelligence gathering and storage, and improve the processing efficiency of field observation data.

3.3.4 Established National Agricultural Resources Things Monitoring Platform

Establishment of national agricultural resources monitoring platform can integrate multi agricultural condition monitoring resources, expand and upgrade the existing infrastructure integration and software integration systems. According to the principle of cloud computing and cloud service management, we can design and build the platform architecture, deployment platform equipment, development software system, than gradually establish a national agricultural condition monitoring platform. The platform can achieve coverage area of the county’s main crop types, to provide real-time dynamic for agricultural research and agricultural production and long-term accumulation of data resources, to provide support for agricultural production and scientific guidance for disaster emergency management.

3.4 Broaden the Web Services Model, Increased Resource Sharing and Communication Platform

The planning in 2014, has put forword to use the latest technology such as responsive web design to optimize the design and development of adaptive Chinese portals, and develop the network carriers to support for smartphones, tablet computers, TV, PC monitor, IOS and Android mobile phone carrier access.

New media platformsIis another task, such as building weibo WeChat which can broad the personalized information service mode and active push, diversified information services.

4 Conclusion

A large-scale production, sharing and application data era is just beginning, as Professor Victor said, the real value of big data is like an iceberg floating in the ocean, we can only see the tip of the iceberg at first glance, most hide beneath the surface. Carry out research work, will also be with the arrival of the era of big data, a huge change, and construction site data resources, but also will become all the researchers involved in scientific research is an important content.