Ciencia habilitada por datos de especímenes

Grigoropoulou, A., S. A. Hamid, R. Acosta, E. O. Akindele, S. A. Al‐Shami, F. Altermatt, G. Amatulli, et al. 2023. The global EPTO database: Worldwide occurrences of aquatic insects. Global Ecology and Biogeography. https://doi.org/10.1111/geb.13648

Motivation Aquatic insects comprise 64% of freshwater animal diversity and are widely used as bioindicators to assess water quality impairment and freshwater ecosystem health, as well as to test ecological hypotheses. Despite their importance, a comprehensive, global database of aquatic insect occurrences for mapping freshwater biodiversity in macroecological studies and applied freshwater research is missing. We aim to fill this gap and present the Global EPTO Database, which includes worldwide geo-referenced aquatic insect occurrence records for four major taxa groups: Ephemeroptera, Plecoptera, Trichoptera and Odonata (EPTO). Main type of variables contained A total of 8,368,467 occurrence records globally, of which 8,319,689 (99%) are publicly available. The records are attributed to the corresponding drainage basin and sub-catchment based on the Hydrography90m dataset and are accompanied by the elevation value, the freshwater ecoregion and the protection status of their location. Spatial location and grain The database covers the global extent, with 86% of the observation records having coordinates with at least four decimal digits (11.1 m precision at the equator) in the World Geodetic System 1984 (WGS84) coordinate reference system. Time period and grain Sampling years span from 1951 to 2021. Ninety-nine percent of the records have information on the year of the observation, 95% on the year and month, while 94% have a complete date. In the case of seven sub-datasets, exact dates can be retrieved upon communication with the data contributors. Major taxa and level of measurement Ephemeroptera, Plecoptera, Trichoptera and Odonata, standardized at the genus taxonomic level. We provide species names for 7,727,980 (93%) records without further taxonomic verification. Software format The entire tab-separated value (.csv) database can be downloaded and visualized at https://glowabio.org/project/epto_database/. Fifty individual datasets are also available at https://fred.igb-berlin.de, while six datasets have restricted access. For the latter, we share metadata and the contact details of the authors.

随机森林(Random forest)模型在2001年发表后得到广泛的关注。由于随机森林可以进行回归和判别等多种统计分析,而且不受正态性、方差齐性和自变量独立性等参数检验的前提条件的制约,其应用日益普遍,有被看作万能模型的趋势。实际上,随机森林是一种特点鲜明的模型,应用局部优化拟合观察值,在分析有偏效应关系的数据时,其结果往往不准确。本文以蝉科(Cicadidea)物种的分布数据为例,比较了随机森林在回归分析时与多元线性回归、广义可加模型和人工神经网络模型的差别,在判别分析时与线性判别分析的差别,强调了随机森林预测时的碎片化特点。结果显示随机森林在处理有多元共线性和交互作用的数据时,以及在判别…

Li, X., B. Li, G. Wang, X. Zhan, and M. Holyoak. 2020. Deeply digging the interaction effect in multiple linear regressions using a fractional-power interaction term. MethodsX 7: 101067. https://doi.org/10.1016/j.mex.2020.101067

In multiple regression Y ~ β0 + β1X1 + β2X2 + β3X1 X2 + ɛ., the interaction term is quantified as the product of X1 and X2. We developed fractional-power interaction regression (FPIR), using βX1M X2N as the interaction term. The rationale of FPIR is that the slopes of Y-X1 regression along the X2 gr…