Scientific publications

Access the HAL platform to submit or view publications resulting from PEPR scientific research.

Logo Hal
© HAL

The PEPR collection is available at https://hal.science/AGROECONUM/ or by performing an advanced search at https://hal.science/.

/!\ In order for a deposit to be correctly associated with the AgroEcoNum collection, it is essential to enter your ANR project code (ANR-22-PEAE-XXXX or ANR-24-PEAE-XXXX).

Acknowledgment to be included in scientific publications of projects that have received funding from the Agroecology and Digital Technology research program:

This work received government funding managed by the Agence Nationale de la Recherche under the France 2030 program as part of the Agroecology and Digital research program, reference number “ANR-**-PEAE-****”.

References by project:

ADAAPT - ANR-24-PEAE-0001
AgriFutur - ANR-24-PEAE-0002
AgroDiv - ANR-22-PEAE-0005
AGROECOPHEN - ANR-22-PEAE-0012
BIODICAPT - ANR-24-PEAE-0003
BReIF - ANR-22-PEAE-0014
CoBreeding - ANR-22-PEAE-0003
CoEDiTAg - ANR-22-PEAE-0002
EcoControl - ANR-24-PEAE-0004
HOLOBIONTS - ANR-22-PEAE-0006
LINDDA - ANR-22-PEAE-0004
MELICERTES - ANR-22-PEAE-0010
MISTIC - ANR-22-PEAE-0011
NINSAR - ANR-22-PEAE-0007
PATASEL - ANR-22-PEAE-0013
Pl@ntAgroEco - ANR-22-PEAE-0009
TwinFarms - ANR-24-PEAE-0005
WAIT4 - ANR-22-PEAE-0008

HAL : Dernières publications

  • [hal-05230510] Seed Inference in Interacting Microbial Communities Using Combinatorial Optimization

    The behaviour of microorganisms and microbial communities can be abstracted by models combining a description of their metabolic capabilities as metabolic networks, and suitable computational or mathematical paradigms that further integrate simulation conditions. A major component of the latter is the composition of the environment or growth medium that can be referred to as seeds. Predicting the seeds from the metabolic network and an expected behaviour is an inverse problem that can be addressed with linear programming or logic paradigms such as Answer Set Programming (ASP). Here, we formalise seed prediction for microbial communities, taking into account that their members may interact positively through metabolite transfers, which may reduce the need for external seed metabolites. We address the problem with ASP and add a hybrid component ensuring the satisfiability of linear constraints. We explore the subset-minimality solving heuristic of the Clingo solver and develop two heuristics supporting priority of seeds over transfers. We present a proof of concept of seed inference in small-scale communities, and assess the scalability of the three heuristics at genome-scale. Overall, our work introduces a hybrid logic-linear model for seed inference in interacting microbial communities, and new heuristics for the exploration of the solution space with subset minimality optimisations.

    ano.nymous@ccsd.cnrs.fr.invalid (Chabname Ghassemi Nedjad) 29 Aug 2025

    https://inria.hal.science/hal-05230510v1
  • [hal-05509112] How long-lived trees remember: Epigenetic memory and priming of drought and heat stress in meristems and embryos

    Abstract With climate change accelerating the frequency and intensity of heat and drought events, forestry urgently needs strategies that enhance stress tolerance without relying solely on genetic improvement, which in trees requires decades. Priming, pre-exposing plants to mild stress or biological signals to reinforce future responses, offers a promising approach for long-lived species. Unlike annual model plants, trees experience multi-year stress cycles, making priming particularly relevant for forestry, restoration, and climate-adaptive management. Our research focuses on developmental windows and cell dividing tissues with high potential for epigenetic memory, somatic embryos and meristems, examined under water deficit, thermal stress, biochar amendment, and mycorrhizal symbiosis. Across experiments, we observe persistent molecular signatures lasting weeks to seasons, and in some cases trans-annual memory. In contrast to short-lived species where histone modifications dominate, trees often display stronger involvement of DNA methylation in these persistent states, consistent with our recent findings in maritime pine embryogenesis and poplar cambium (Trontin et al., 2025; Duplan et al., 2025; and ongoing work). More recently, we investigated how biochar and beneficial root symbioses interact with drought priming in poplar. These studies form the basis of long-term research frameworks and national programs, including EPIMYC (ANR-24-CE20-5751) and the PEPR Agroecology & Digital initiative (ANR-24-PEAE-0001). Ultimately, our goal is to integrate omics layers to build predictive models of priming responsiveness and epigenetic plasticity, enabling identification of biomarkers and management-ready diagnostic tools to guide climate-adaptive forestry. References 1. Trontin, J.F., Sow, M.D., Delaunay, A., Modesto, I., Teyssier, C., Reymond, I., Canlet, F., Boizot, N., Le Metté, C., Gibert, A., Chaparro, C., Daviaud, C., Tost, J., Miguel, C., Lelu-Walter, M.A., & Maury, S. 2025. Epigenetic memory of temperature sensed during somatic embryo maturation in 2-yr-old maritime pine trees. Plant Physiology, 197(2), kiae600. https://doi.org/10.1093/plphys/kiae600 2. Duplan, A., Feng, Y.Q., Laskar, G., Cai, B.D., Segura, V., Delaunay, A., Le Jan, I., Daviaud, C., Toumi, A., Laurans, F., Sow, M.D., Rogier, O., Poursat, P., Duruflé, H., Jorge, V., Sanchez, L., Cochard, H., Allona, I., Tost, J., Fichot, R., & Maury, S. 2025. Drought induced epigenetic memory in the cambium of poplar trees persists and primes future stress responses. bioRxiv 2025.10.14.681991. https://doi.org/10.1101/2025.10.14.681991

    ano.nymous@ccsd.cnrs.fr.invalid (Stéphane Maury) 13 Feb 2026

    https://hal.science/hal-05509112v1
  • [hal-05500190] Epigenetic regulation of mycorrhizal symbioses: from plastic responses to transgenerational legacies

    Mycorrhizal symbioses represent one of the most widespread and ecologically significant plant–microbe interactions, shaping plant nutrition, stress resilience, and ecosystem functioning. Beyond their role in nutrient exchange and systemic defense, growing evidence suggests that these symbioses also influence plant plasticity within and across generations through epigenetic regulation. These mechanisms operate throughout the mutualistic interaction, from fungal recognition and root colonization to symbiosis functioning, by regulating gene networks that control signaling, defense suppression, and nutrient exchange. By integrating environmental cues into potentially heritable gene regulatory states, epigenetic regulation fine‐tunes within‐generation responses and may also contribute to effects across generations, thereby influencing adaptation and resilience. The extent of mycorrhiza‐induced epigenetic inheritance likely depends on the host's reproductive strategy and lifespan. Clonal propagation and shorter‐lived hosts tend to preserve epigenetic marks, whereas sexual reproduction and longer‐lived species show partial resetting. This contrast shapes offspring performance, ecological interactions, and evolutionary trajectories. Here, we synthesize current knowledge on the epigenetic regulation of mycorrhizal symbioses, draw parallels with other plant–microorganism interactions (including plant–pathogens and plant–endophytes), highlight its role in within‐generation plasticity and propose a potential role across generations. We outline future research directions to disentangle the stability, ecological relevance, and evolutionary significance of mycorrhiza‐mediated epigenetic inheritance.

    ano.nymous@ccsd.cnrs.fr.invalid (Gerson Beltrán-Torres) 09 Feb 2026

    https://hal.inrae.fr/hal-05500190v1
  • [hal-05494492] Modelling and predicting soil microbial communities at large spatial scale based on metagenomic dimensionality reduction

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Emna Stambouli) 05 Feb 2026

    https://inria.hal.science/hal-05494492v1
  • [hal-05496194] WAIT4 : Intelligence artificielle et nouvelles technologies pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis de la transition agroécologique - contribution au continuum numérique

    Améliorer le bien-être animal est indispensable pour construire des systèmes alimentaires durables. Les agroéquipements (capteurs, caméras, automates) associés à l’intelligence artificielle (IA), peuvent permettre d’évaluer le bien-être des animaux et des troupeaux en temps réel. Ceci est particulièrement utile face aux défis posés par le changement climatique et les transitions agroécologiques des systèmes d’élevage, afin de disposer d’outils et méthodes pour anticiper les risques et agir efficacement.

    ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 05 Feb 2026

    https://hal.inrae.fr/hal-05496194v1
  • [hal-05514911] BReIF: une e-infrastructure pour accélérer l'utilisation de ressources biologiques diversifiées

    La caractérisation des ressources génétiques génère des quantités massives de données de nature très diverses qu’il faut analyser, gérer, rendre réutilisable et intégrer pour les transformer en connaissances mobilisables.

    ano.nymous@ccsd.cnrs.fr.invalid (Anne-Françoise Adam-Blondon) 17 Feb 2026

    https://hal.inrae.fr/hal-05514911v1
  • [hal-05511164] Des réseaux de neurones sur graphes auto-explicatifs basés sur la logique

    Les graphes sont des structures complexes et non euclidiennes qui nécessitent des modèles spécialisés comme les réseaux de neurones sur graphes (Graph Neural Networks, GNNs) pour capturer efficacement les motifs relationnels associés à la variable de classe. Cette complexité intrinsèque rend particulièrement difficile l’explication des décisions prises par les GNNs. La plupart des méthodes actuelles d’intelligence artificielle explicable (XAI) appliquées aux GNNs se concentrent sur l’identification de nœuds influents ou l’extraction de sous-graphes pertinents, sans toutefois clarifier comment ces éléments contribuent réellement à la prédiction finale. Pour dépasser cette limite, les approches à base logique visent à dériver des règles explicites reflétant le raisonnement du modèle. Cependant, les méthodes logiques existantes demeurent majoritairement post-hoc et se limitent à la classification de graphes, laissant un manque important en matière d’architectures intrinsèquement explicables. Dans cet article, nous intégrons le raisonnement logique directement au sein du modèle d’apprentissage sur graphes. Nous introduisons LogiX-GIN, une nouvelle architecture de GNN auto- explicable qui incorpore des couches logiques afin de produire des règles logiques interprétables au cœur même du processus d’apprentissage. Contrairement aux approches post-hoc, LogiX-GIN fournit des explications transparentes, fidèles et cohérentes avec les calculs internes du modèle. Évalué sur plusieurs tâches basées sur des graphes, LogiX-GIN atteint des performances prédictives compétitives tout en explicitant son processus décisionnel. Ces travaux ont été acceptés à NeurIPS 2025

    ano.nymous@ccsd.cnrs.fr.invalid (Alessio Ragno) 14 Feb 2026

    https://hal.science/hal-05511164v1
  • [hal-05506052] CIP-Net: Continual Interpretable Prototype-based Network

    Continual learning constrains models to learn new tasks over time without forgetting what they have already learned. A key challenge in this setting is catastrophic forgetting, where learning new information causes the model to lose its performance on previous tasks. Recently, explainable AI has been proposed as a promising way to better understand and reduce forgetting. In particular, self-explainable models are useful because they generate explanations during prediction, which can help preserve knowledge. However, most existing explainable approaches use post-hoc explanations or require additional memory for each new task, resulting in limited scalability. In this work, we introduce CIP-Net, an exemplar-free self-explainable prototype-based model designed for continual learning. CIP-Net avoids storing past examples and maintains a simple architecture, while still providing useful explanations and strong performance. We demonstrate that CIP-Net achieves state-of-the-art performances compared to previous exemplar-free and self-explainable methods in both task-and class-incremental settings, while bearing significantly lower memory-related overhead. This makes it a practical and interpretable solution for continual learning.

    ano.nymous@ccsd.cnrs.fr.invalid (Federico Di Valerio) 11 Feb 2026

    https://hal.science/hal-05506052v1
  • [hal-05446899] Animating the transition: How agriculture 5.0 revitalises agroecological principles

    Agriculture is undergoing a rapid digital transformation that challenges its ecological, social, and ethical foundations. This study explores how the transition from two revolutions, from Agriculture 4.0 (A4.0) to Agriculture 5.0 (A5.0), redefines the relationship between technology and agroecology. The dominant approach of A4.0, driven by automation, big data, and artificial intelligence, has enhanced efficiency but missed many agroecological principles, mainly those contributing to secure social equity and responsibility. Emerging as a corrective paradigm, A5.0 seeks to integrate technological progress with agroecological principles that value the social and human dimension. Adopting a scoping review following PRISMA-ScR guidelines, scientific publications indexed in Scopus and CABI up to October 2025 were screened and coded to assess how current A5.0 research embeds the thirteen agroecological principles defined by the High-Level Panel of Experts in 2019. A total of 136 documents were analysed through bibliometric and thematic synthesis. Results show that A5.0 represents a philosophical and structural evolution beyond the efficiency-oriented logic of A4.0, integrating distributed computing, explainable artificial intelligence, digital twins, and collaborative robotics within ecologically restorative and socially inclusive frameworks. However, while A5.0 strengthens resource efficiency, resilience, and certain social segments through open-source technologies and participatory design, gaps remain in policy coherence, emotional engagement, and human-machine co-learning. To address these, the study proposes two complementary agroecological principles, cognitive symbiosis and emotional ecology, emphasising shared intelligence and affective stewardship between humans, machines, and ecosystems. Overall, Agriculture 5.0 reframes digitalisation as a human-ecological partnership that can operationalise agroecology's ethical goals if governed by inclusion, transparency, and regeneration rather than control and optimisation.

    ano.nymous@ccsd.cnrs.fr.invalid (Mohammad Naim) 07 Jan 2026

    https://hal.science/hal-05446899v1
  • [hal-05447092] MetaNetMap: automatic mapping of metabolomic data onto metabolic networks

    Metabolic networks represent genome-derived information about the biochemical reactions that cells are capable of performing. Mapping omic data onto these networks is important to refine model simulations. However, metabolomic data mapping remains very challenging due to difficulties in identifier reconciliation between annotation profiles and metabolic networks. MetaNetMap is a Python package designed to automatise the process of mapping metabolomic data onto metabolic networks. It includes several layers of identifier matching, the use of customisable databases, and molecular ontology integration to suggest the most matches between experimentally-identified metabolites and molecules defined in the network.

    ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 07 Jan 2026

    https://inria.hal.science/hal-05447092v1
  • [hal-05410799] Data Paper: HotPig, a behavioural dataset of pigs under heat stress

    The widespread use of videos in modern indoor livestock facilities coupled with the availability of efficient and low-cost computer vision algorithms provides strong incentives for continuously monitoring farm animal behaviour. Deciphering how pigs behave when experiencing prolonged heat stress is particularly important for animal welfare, as it helps us to better understand how animals use various thermoregulation and heat dissipation mechanisms. Data were collected on 24 pigs that were video-monitored day and night under two contrasted conditions: thermoneutral (TN, 22 °C) and heat stress (HS, 32 °C). All pigs were housed individually and had free access to an automatic feeder delivering pellets four times a day, and to water. After acquisition, videos were processed using YOLOv11, a real-time object detection algorithm that uses a convolutional neural network (CNN), to extract the following behavioural traits: drinking, willingness to eat, lying down, standing up, moving around, curiosity towards the littermate housed in the neighbouring pen, and contact between the two animals (cuddling). A minute frequency sampling rate was applied (each minute corresponds to 150 frames processed) for a continuous period of 16 days, spanning the two different thermal conditions (9 days on TN, 6 days on HS, 1 day back to TN). Consistency with the automatic electronic feeder’s data (also provided) was thoroughly checked. The dataset allows quantitative criterion to be analysed to decipher inter-individual differences in animal behaviour and their dynamic adaptation to heat stress. This dataset can be used to train any machine learning methods for behaviour prediction from videos in conventional growing pigs.

    ano.nymous@ccsd.cnrs.fr.invalid (Louis Bonneau de Beaufort) 11 Dec 2025

    https://hal.inrae.fr/hal-05410799v1
  • [hal-05348017] Measuring shade use of dairy cattle at pasture with an on-cow light sensor: a case study

    Grazing cows preferentially access shade to shield against the sun. However, the conditions that provide cows with optimal shade access and use (e.g. no competition for access to shade) are still unknown. Continuous monitoring of shade use by grazing cattle could help to understand how and when cows use shade resources. The aim of this study was to validate a method based on a light sensor (HOBO Pendant MX2202) attached to the back (on the transverse processes of the lumbar vertebrae) of 7 dairy cows at pasture to continuously record their use of natural shade for research purposes. Live behavioral observations of shade use and cow posture were recorded in summer (June to September, between 9 am and 6 pm). Based on the behavioral observation data, we determined thresholds in lux to discriminate between cows in shade and cows in sun on a randomly-generated training dataset representing 15 % of the initial dataset. This process was repeated 100 times, generating 100 thresholds and threshold performances. Data loss due to sensor loss or battery discharge was 9 %, which is acceptable. The thresholds ranged from 15,688 to 40556 lx: sensitivity ranged from 92.0 % to 99.8 % and specificity ranged from 88.7 % to 99.9 %, showing that the performances were robust to threshold variation within this range. This study demonstrates that an efficient threshold to discriminate cows in shade from cows in the sun can be determined via a relatively short (about 12 h) series of live observations. As performances seem to be slightly lower for lying cows than for standing cows (mean false-positive rate is 7.4 % for lying cows versus 1.8 % for standing cows), future studies should consider the posture (which can also be monitored continuously with other sensors such as accelerometer installed on the legs or on the neck collar of the cows).

    ano.nymous@ccsd.cnrs.fr.invalid (Lydiane Aubé) 05 Nov 2025

    https://hal.inrae.fr/hal-05348017v1
  • [hal-05495907] De nouveaux alliés du bien-être des animaux confrontés aux défis des transitions agroécologiques et climatiques

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 05 Feb 2026

    https://hal.inrae.fr/hal-05495907v1
  • [hal-05444004] Les technologies numériques en élevage : de la mesure à l’évaluation comportementale du bien-être de chaque animal

    Le bien-être des animaux est une notion difficile à définir car se référant à un phénomène complexe, intrinsèquement liée à la perception qu’a l’individu de son environnement. Ne pouvant être mesuré directement, le bien-être est évalué à partir de la détermination et la quantification d’indicateurs spécifiques. Ces indicateurs, dont les variations sont associées à différents états de bien-être, doivent être combinés en fonction du contexte d’évaluation. Le comportement animal, reconnu comme une des clés pour l’évaluation du bien-être, peut changer face aux variations de l’environnement d’élevage, telles que l’accès au pâturage, influençant à la fois la routine et la dynamique de l’occupation de l’espace des animaux. L'analyse de ces changements comportementaux permet de définir de nouveaux indicateurs, facilitant l’évaluation de l’impact positif ou négatif de ces modifications environnementales sur le bien-être des animaux. L’intégration des technologies de capteurs, de modèles mathématiques et de l’intelligence artificielle ouvre de nouvelles perspectives pour un suivi longitudinal des activités, des dynamiques spatiales et d’autres paramètres d’intérêt tout au long du cycle de vie des animaux. Par exemple, les algorithmes de classification supervisée ont permis d’associer les données brutes fournies par des capteurs aux comportements d’intérêt, tandis que les algorithmes non supervisés devraient révéler de nouveaux indicateurs en lien avec le bien-être des animaux. Cet article met en lumière les opportunités offertes par les technologies numériques émergentes. Nous nous concentrons sur l’évaluation comportementale et son rôle crucial dans l’évaluation du bien-être, en présentant trois études de cas : 1) pour distinguer les problèmes liés à la santé, au stress thermique et à la reproduction chez les vaches laitières, 2) pour prévoir la boiterie chez la vache laitière et 3) pour étudier des émotions chez les porcs. Enfin, nous soulignons l’importance d’une collaboration interdisciplinaire étroite entre éthologistes, physiologistes, mathématicien(ne)s et informaticien(ne)s pour favoriser le développement de ce domaine émergent que nous désignons sous le terme d’« éthologie numérique ».

    ano.nymous@ccsd.cnrs.fr.invalid (Masoomeh Taghipoor) 06 Jan 2026

    https://hal.inrae.fr/hal-05444004v1
  • [hal-05419350] MetaNetMap: automatic mapping of metabolomic data onto metabolic networks

    MetaNetMap is a Python tool dedicated to mapping metabolite information between metabolomic data and metabolic networks. The goal is to facilitate the identification of metabolites from metabolomic data that are present in one or more metabolic networks to facilitate further modelling, taking into consideration that data from the former likely has distinct identifiers from the latter.

    ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 17 Dec 2025

    https://inria.hal.science/hal-05419350v1
  • [hal-05476560] Deriving breeding goals and expected selection responses to reduce environmental impacts in rainbow trout farming

    Background With growing societal concerns about the sustainability of food production systems, there is increasing interest in considering not only economic gains but also environmental issues in breeding programs of farmed species. In this study, we compared expected selection responses for breeding programs aiming to minimize environmental impacts of the production of rainbow trout in France, one of the most important fish species in salmonid aquaculture. The consequences of genetic improvement based on environmental merit indices were investigated in a hypothetical rainbow trout production farm with a constant annual production of 300 tonnes of fish. The merit indices included three different traits: thermal growth coefficient (TGC), daily feed intake (DFI), and survival (SR). A cradle-to-farm-gate life cycle assessment was conducted to evaluate the environmental values of each trait, which served as weightings in breeding goals aiming at minimizing expected environmental impacts by genetic selection. We explored nine different environmental impact categories: climate change, terrestrial acidification, freshwater eutrophication, marine eutrophication, terrestrial ecotoxicology, freshwater ecotoxicology, land use, water dependence, and cumulative energy demand. Results Selection accuracy ranged from 0.34 to 0.43, with the lowest accuracy observed for the breeding goal targeting reduced water dependence, and the highest for those targeting reductions in eutrophication and terrestrial ecotoxicity. Annual genetic gains in reductions of environmental impacts, expressed per tonne of trout, were high for reducing eutrophication potential (− 6.80 to − 2.61%) and terrestrial ecotoxicity (− 4.14 to − 1.59%), but negligible for water use reduction (− 0.04 to − 0.01%). Genetic changes in DFI and TGC led to substantial annual gains in feed conversion ratio, from 1.7 to 4.8%. However, SR showed no improvement and often declined, highlighting the difficulty of balancing genetic gains across traits. Conclusions We demonstrated the benefits of using environmental values in breeding goals to minimize environmental impacts at the farm level, while maintaining high genetic gains in feed efficiency traits. Nevertheless, we also showed that selection efficiency was highly dependent of the impact category. Our results suggested that another selection strategy should be considered to avoid unfavourable consequences on SR.

    ano.nymous@ccsd.cnrs.fr.invalid (Simon Pouil) 26 Jan 2026

    https://hal.inrae.fr/hal-05476560v1
  • [hal-05435147] On Logic-based Self-Explainable Graph Neural Networks

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Alessio Ragno) 30 Dec 2025

    https://hal.science/hal-05435147v1
  • [hal-04997560] Data paper: A goat behaviour dataset combining labelled behaviours and accelerometer data for training Machine Learning detection models

    This paper presents a dataset of accelerometer data and corresponding video-annotated behaviours from eight indoor dairy Alpine goats. Animals were equipped with 3D-accelerometers attached to their ears for 24 consecutive hours and recorded at a frequency of 5 Hz. Video recordings for this period were also obtained. Activities associated with positional, feeding and social behaviours were annotated over two daylight periods, for a total of 11 hours per goat, by a trained observer assuring high precision and consistency. This dataset can be used independently or complement an existing dataset for training supervised Machine Learning models for the detection of goat behaviour. It contributes to improving the robustness of such models by incorporating behavioural signals specific to indoor-housed goats.

    ano.nymous@ccsd.cnrs.fr.invalid (Sarah Mauny) 19 Mar 2025

    https://hal.inrae.fr/hal-04997560v1
  • [hal-05264391] Method: An accurate method for detecting drinking bouts in dairy cows based on reticulorumen temperature

    This study evaluated the performances of three methods for detecting drinking bouts in dairy cows using reticulorumen temperature (RT): the 'FixT' method based on a fixed RT threshold, the 'Cow-dT' method based on a cow-day-specific RT threshold, and the 'FallST' method based on RT fall slope. We observed the drinking behaviours of 28 dairy cows equipped with reticulorumenal sensors over 96 h to create a reference dataset. A total of 730 drinking bouts were observed. We matched detected drinking bouts against observed drinking bouts to obtain the number of true-positives, false-negatives, and falsepositives, and then calculated the detection performances of the three methods in terms of sensitivity (Se), positive predictive value (PPV), and F-score. The performances of the three RT-based methods (Se ≥ 90%, PPV > 96% and F-score ≥ 93%) were better than those from previous work using collarattached accelerometers, but slightly lower than methods using drinking troughs connected to electronic identification systems or methods combining accelerometers with geomagnetic sensors or with ultrawideband location. The FallST method showed slightly better performance (highest F-score) than the FixT and Cow-dT methods. The FallST method accurately detected drinking bouts lasting more than 30 s and at least 30 min apart, with a detection time accuracy of 10 min. The models using RT curve parameters failed to predict characteristics of the drinking bouts. In conclusion, the method developed here can accurately detect drinking bouts in dairy cows using RT, but without further characterisation of the drinking bouts (e.g. duration).

    ano.nymous@ccsd.cnrs.fr.invalid (L. Aubé) 17 Sep 2025

    https://hal.inrae.fr/hal-05264391v1
  • [hal-05385353] WAIT4 – un projet de recherche alliant technologies numériques et IA pour évaluer des indicateurs pertinents de bien-être pour des animaux confrontés aux défis des transitions agroécologique et climatique

    Le projet WAIT4 exploite les opportunités offertes par les technologies numériques pour mesurer différentes composantes du bien-être animal en temps réel ; il met en œuvre des approches d’IA pour intégrer les données hétérogènes, par nature et en temporalité, qui sont ainsi collectées. L’objectif est de définir de nouveaux indicateurs et la fréquence pertinente avec laquelle les mesurer, afin d’identifier les variations du bien-être de l’animal. Différentes espèces (porcins, petits et gros ruminants), en systèmes conventionnels, biologiques ou agropastoraux, et sous des climats contrastés sont abordées. L’ambition est de détecter des déviations précoces des changements de bien-être et de santé en réponse à des changements de pratiques et face aux aléas climatiques. Le projet met en œuvre des actions concertées associant des d'instituts français de recherche (INRAE, CEA, INRIA, INSA), et un dialogue avec les porteurs d’enjeux grâce à l’appui du LIT Ouesterel pour faciliter l’appropriation et la diffusion des résultats. Le projet WAIT4 (2023-2027), coordonné par INRAE, est financé par France 2030 dans le cadre du PEPR Agroécologie et Numérique.

    ano.nymous@ccsd.cnrs.fr.invalid (Florence Gondret) 27 Nov 2025

    https://hal.inrae.fr/hal-05385353v1
  • [hal-05451323] Detecting signatures underlying the composition of biological data

    Biological compositional data is inherently multidimensional and therefore difficult to visualize and interpret. To allow for the automatic decomposition of large compositional data and to capture gradients in co-occurring features, called signatures, we developed a new software package 'cvaNMF'. Our benchmarks on synthetic data show the effectiveness of cross-validation and our novel signature-similarity method to identify a suitable decomposition using non-negative matrix factorization (NMF). This software provides a complete set of tools to identify and visualize biologically informative signatures which we demonstrate in a wide range of microbial and cellular datasets: 'Enterosignatures' detected in gut metagenomes differentiated human hosts with diverse diseases; five 'terrasignatures' from rhizosphere metagenomes differentiated root-or soil-associated microbiomes, while being refined enough to infer geographic distances between plants. Large-scale data from 13,000 metagenomes representing 25 biomes were decomposed into environmental and host-associated microbiomes based on five newly discovered signatures. Finally, analysis of the cell composition of non-small cell lung cancer samples allowed separation of cancerous and inflamed tissues based on four cell-type signatures.

    ano.nymous@ccsd.cnrs.fr.invalid (Anthony Duncan) 09 Jan 2026

    https://inria.hal.science/hal-05451323v1
  • [hal-05469230] cMFA for multi-omics data integration in microbial community models

    Understanding microbial community functions is challenging because of complex interactions and assembly mechanisms. However, recent advances in sequencing technologies have enabled the collection of multi-omics time-series data at the community scale, including population abundances as well as metabolomic and metatranscriptomic measurements. The main objective of this work is to develop a modeling framework capable of integrating such multi-omics time-series data to infer metabolic activity at the community level. We introduce a method called community Metabolic Flux Analysis (cMFA), which extends classical metabolic flux analysis to microbial communities. The approach relies on experimentally measured time-series data describing metabolite production and consumption rates, as well as microorganism growth. The goal is to infer, for each member of the microbial community, the distribution of intracellular metabolic fluxes that is consistent with these observations. The inference problem is formulated as a constrained regression problem in which predicted exchange fluxes are fitted to experimental measurements. The model incorporates biological constraints, including mass conservation at the intracellular level and bounds on metabolic fluxes. Additional information from metatranscriptomic data is integrated through a regularization term that guides the inference toward biologically plausible solutions. The main challenge lies in accurately recovering latent intracellular fluxes from a limited number of extracellular measurements. The cMFA method was evaluated using synthetic data generated from dynamic models of microbial communities of increasing complexity. These models were based on metabolic networks of different Escherichia coli mutants simulated using dynamic flux balance analysis. Synthetic metatranscriptomic data were derived from the internal fluxes of the dynamic models. Several regularization strategies were tested, including different sparsity levels, and multiple benchmarks were used to assess robustness. These benchmarks evaluated the sensitivity of the method to measurement noise, incomplete metatranscriptomic data, inaccurate prior knowledge of metabolite uptake rates, and increasing community size. Ongoing work focuses on applying the method to real experimental datasets, including denitrification processes and cheese production systems.

    ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 21 Jan 2026

    https://hal.science/hal-05469230v1
  • [hal-05380224] NINSAR Project: Defining Agroecological Routes Using Robots

    The poster presents the doctoral research of Mohammad Naim, conducted within the French national project NINSAR (New ItiNerarieS for Agroecology using cooperative Robots), and outlines how the thesis contributes to this broader research programme. The NINSAR project, as framed in the poster title and structure, is positioned as a national effort to define agroecological routes using robotics, integrating technological innovation with ecological, social, and economic sustainability goals. Within this context, the thesis investigates how autonomous agricultural systems can be designed, evaluated, and adopted without compromising core agroecological principles. The thesis analyzes the transition from Agriculture 4.0 to Agriculture 5.0 through the thirteen agroecological principles defined by the High Level Panel of Experts, assessing how emerging robotic and data-driven systems can support more sustainable production models. It evaluates three major categories of robotic field operations (data collection, soil and crop management, and navigation/communication) and links them to four principle-level agroecological indicators, finding strong contributions to soil health and synergy and weaker support for recycling. The work also conducts an empirical study of French farmers using the Technology Acceptance Model 2, identifying perceived usefulness as the central predictor of adoption, complemented by ease of use and social influence. A complementary technical study clusters 71 agricultural robots into five functional categories, illustrating the increasing specialization of robotic platforms and cost differences between electric and endothermic systems. The thesis further extends to the economic and industrial dimension of the NINSAR project by engaging manufacturers through semi-structured interviews to construct business model canvases aimed at identifying viable pathways for scaling agroecological robots. Taken together, the poster shows that Naim’s thesis forms a core component of NINSAR by integrating agronomic, technological, social, and economic analyses to support the development of robotics aligned with agroecological transition goals.

    ano.nymous@ccsd.cnrs.fr.invalid (Mohammad Naim) 24 Nov 2025

    https://hal.science/hal-05380224v1
  • [hal-05444605] Investigating pre-assembly clustering of HiFi reads for de novo assembly of complex metagenomes

    Despite advancements in sequencing technologies, metagenome assembly in taxonomically rich ecosystems remains challenging. Due to the abundance of low-coverage species, many regions in the assembly graph either lack coverage, are too complex, or present a combination of both factors. Clustering reads prior to assembly reduces complexity, but also decreases coverage within each cluster. While effective in improving short-read assemblies in proof-of-concept studies, it has not been widely adopted. In this work, we investigate whether upstream clustering of PacBio HiFi long reads improves assembly quality. To demonstrate the potential of this approach, we simulated an ideal read clustering by comparing the assembly of individual simulated genomes with that of those same simulated genomes merged within a complex ecosystem containing related species. We found that all genomes were better assembled isolated than within the metagenome.

    ano.nymous@ccsd.cnrs.fr.invalid (Nicolas Maurice) 07 Jan 2026

    https://hal.science/hal-05444605v1
  • [hal-05459304] Coupling microbial communities models with data

    This presentation explores different mathematical models of microbial communities, with a focus on how models are tailored to the specificities of the microbial system and the available data. These models will be showcased in a range of microbial ecosystems including the gut microbiota, a cheese fermentation community, and biofilms. Finally, we will introduce the concept of digital twins for microbial systems, discussing their potential and challenges through concrete examples.

    ano.nymous@ccsd.cnrs.fr.invalid (Simon Labarthe) 15 Jan 2026

    https://hal.inrae.fr/hal-05459304v1
  • [hal-05368332] Modeling the emergent metabolic potential of soil microbiomes in Atacama landscapes

    <div><p>Background Soil microbiomes harbor complex communities from which diverse ecological roles unfold, shaped by syntrophic interactions. Unraveling the mechanisms and consequences of such interactions and the underlying biochemical transformations remains challenging due to niche multidimensionality. The Atacama Desert is an extreme environment that includes unique combinations of stressful abiotic factors affecting microbial life. In particular, the Talabre Lejía transect is a natural laboratory for understanding microbiome composition, functioning, and adaptation.</p></div> <div>Results<p>We propose a computational framework for the simulation of the metabolic potential of microbiomes, as a proxy of how communities are prepared to respond to the environment. Through the coupling of taxonomic and functional profiling, community-wide and genome-resolved metabolic modeling, and regression analyses, we identify key metabolites and species from six contrasting soil samples across the Talabre Lejía transect. We highlight the functional redundancy of whole metagenomes, which act as a gene reservoir, from which site-specific adaptations emerge at the species level. We also link the physicochemistry from the puna and the lagoon samples to metabolic machineries that are likely crucial for sustaining microbial life in these unique environmental conditions. We further provide an abstraction of community composition and structure for each site that allowed us to describe microbiomes as resilient or sensitive to environmental shifts, through putative cooperation events.</p></div> <div>Conclusion<p>Our results show that the study of multi-scale metabolic potential, together with targeted modeling, contributes to elucidating the role of metabolism in the adaptation of microbial communities. Our framework was designed to handle non-model microorganisms, making it suitable for any (meta)genomic dataset that includes high-quality environmental data for enough samples.</p></div>

    ano.nymous@ccsd.cnrs.fr.invalid (Constanza M Andreani-Gerard) 17 Nov 2025

    https://inria.hal.science/hal-05368332v1
  • [hal-05178193] Spectral indices in remote sensing of soil: definition, popularity, and issues. A critical overview

    Serving as a powerful proxy in remote sensing studies, spectral indices can generate meaningful environmental interpretation from either raw or atmospherically corrected spectral data, and characterise and quantify some important properties of various objects on Earth’s surface. However, while numerous spectral indices have been developed over time, since the very launch of civilian satellites until now, some critical issues in their usage, such as comparability, remain scarcely studied, which may lead to incorrect, inconsistent, and unreliable results. In this study, we collected 471 spectral indices of various environment components (vegetation, water, and soil) that might be leveraged for soil studies, and traced their popularity in scientific publications over the past decades. The bibliometric analysis revealed a growing interest and utilisation of spectral indices as Earthobserving satellite technology advanced. Based on both literature and, for sake of complementation and illustration, some targeted regional-scale case studies, we discuss the issues of naming confusion, comparability, applicability, accuracy trade-offs, and reproducibility of using spectral indices. Overall, this overview provides an extensive list of spectral indices, both soil indices and soil-related indices, that can be useful for characterising these environment components by remote sensing. It draws attention to some misuses and confusions that must be avoided to prevent scientific pitfalls. The comparisons between different spectral indices, sensors, and correction methods, highlight the confusing effects that the misuse and non-standardised practices of the spectral indices useful for soil, may have on soil property mapping and monitoring. Insights to the judicious and appropriate usage of spectral indices in the remote sensing of soil are provided.

    ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 24 Jul 2025

    https://hal.inrae.fr/hal-05178193v1
  • [hal-05340010] Deep-Plant-Disease Dataset Is All You Need for Plant Disease Identification

    Deep learning models have emerged as a promising alternative to conventional approaches for plant disease identification, a critical challenge in agricultural production. However, the existing plantdisease datasets are insufficient to address the complexities of realworld agricultural scenarios, such as multi crop disease, unseen, few-shot, and domain shift adaptation. Additionally, the lack ofstandardized evaluation protocols and benchmark datasets hinders the fair evaluation of models against these challenges. To bridge this gap, we introduce Deep-Plant-Disease, the largest and mostdiverse dataset with novel text data designed to enhance model generalization in multi crop disease identification. We revisit and reformulate the task by establishing a standardized evaluation framework that supports consistent benchmarking and guides future research. Through experiments, we further validate the robustness and adaptability of models trained on our dataset, highlighting their effective transferability to real-world agricultural challenges.

    ano.nymous@ccsd.cnrs.fr.invalid (Abel Yu Hao Chai) 31 Oct 2025

    https://inria.hal.science/hal-05340010v1
  • [hal-05478330] Weakly supervised segmentation of leaf symptoms in field conditions

    Background Crop diseases can cause significant yield losses. Deep learning models for computer vision offers powerful tools to enhance human observation of plant disease symptoms, for instance by using segmentation models to mark out foliar symptoms. However, the most common and effective architectures rely on a fully supervised learning that requires numerous, costly and often unavailable, pixel-level annotated images.To overcome this, we focus on weakly supervised segmentation [1]. The principle is to generate segmentation masks from less informative annotations, such as image-level labels, in order to train segmentation models with reduced annotation effort.

    ano.nymous@ccsd.cnrs.fr.invalid (Romane Dubois) 26 Jan 2026

    https://hal.science/hal-05478330v1
  • [hal-05343366] Forest Cover in the Congo Basin: Consistency Evaluation of Seven Datasets

    <div><p>Tropical forests play an essential role in the carbon and water cycles of terrestrial ecosystems, but they are increasingly threatened by human activities and climate change. For places where ground observations are scarce, like in Equatorial Africa, remote sensing is a key source of information for monitoring the temporal and spatial dynamics of forests over large areas. Several Earth Observation-based global maps were developed in recent decades using different definitions of the land-use/land-cover (LULC) classes. While such products are widely used for monitoring land use and planning land management, the consistency of these LULC maps for the Congo Basin has never been analyzed and quantified at the ecosystem level. Here, we selected seven of the most-used global maps and analyzed their consistency over the Congo Basin. After reclassification into forest/non-forest masks and spatial resampling, we assessed the agreement and disagreement percentage across the different tropical ecoregions of Africa, from moist forest to miombo, including savanna. The datasets showed differences in forest area as a function of spatial resolution, with higher forest area levels at coarser resolutions (e.g., from 74.1% to 88.5% forest cover when upscaling the GLCLU from 30 m to 1 km over the Congo Basin). A higher agreement between the datasets was found for forest area over moist forest (between 88.18% and 99.38%) in comparison to savanna (32.82%-99.84%) and miombo (53.83%-99.7%). These discrepancies led to large differences in forest cover, varying from a net loss of 205,704 km 2 to a net gain of 50,726 km 2 over 2001-2019 depending on the dataset used. This study draws attention to the uncertainty associated with these products with regard to forests, particularly in regions of biological importance, such as the miombo and savanna regions, which remain poorly understood. Indeed, the two major uncertainties affecting the quality of LULC products are related to the different spatial resolutions and biological definition of "forest" adopted by each product.</p></div>

    ano.nymous@ccsd.cnrs.fr.invalid (Solène Renaudineau) 03 Nov 2025

    https://hal.science/hal-05343366v1
  • [hal-05322783] Whole genome sequencing dataset for a Vitis vinifera diversity panel

    Vitis vinifera is a significant agricultural species across continents and a genomic model for perennial crops. A diversity panel of 279 cultivars from the Vassal-Montpellier Grapevine Biological Resources Centre, which represents the diversity of the three main genetics pools of this species, has served as a foundation for genome-wide association studies using genotyping-by-sequencing approaches. Part of this panel (74 cultivars) has recently been sequenced at the whole genome level. Here, we release whole-genome sequencing of the remaining 205 cultivars of the panel, using the short-read NovaSeq6000 S4 PE150 technology to achieve complete genomic coverage. To ensure consistency with prior analyses and confirm genetic identities, we performed variant calling and SNP comparison with previously published data. During this stage, we identified two mislabeled samples, which were excluded from the dataset, resulting in a final set of 72 samples from the public data. Additionally, nine representative cultivars spanning major genetic groups underwent long-read sequencing using PacBio Revio technology. All sequences have been deposited at the ENA under project PRJEB95058 for the short-read data and project PRJEB100755 for the long-reads. Variant data have been deposited in the publicly accessible GIGWA SNP database. This expanded genomic dataset establishes a comprehensive foundation for advanced genomic analyses in V. vinifera, including genome-wide association mapping, structural variant characterization, and genetic diversity assessment. The long-read sequences provide high-quality genomic resources for structural variation analysis and pangenome construction. The integration of short-and long-read sequencing technologies enhances the usefulness of this resource for understanding grapevine genomic architecture and supporting genetic improvement initiatives.

    ano.nymous@ccsd.cnrs.fr.invalid (Gautier Sarah) 20 Oct 2025

    https://hal.science/hal-05322783v1
  • [hal-05435107] Leveraging internal representations of GNNs with Shapley values

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Ataollah Kamal) 30 Dec 2025

    https://hal.science/hal-05435107v1
  • [hal-05350945] Considering farmers’ needs in agroliving labs : a case study

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Mélanie Broin) 06 Nov 2025

    https://hal.science/hal-05350945v1
  • [hal-05435121] Diffusion for Explainable Unsupervised Anomaly Detection

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Elouan Vincent) 30 Dec 2025

    https://hal.science/hal-05435121v1
  • [hal-05318560] cMFA for multi-omics data integration in microbial community models

    Understanding microbial community functions is challenging due to complex interactions and assembly mechanisms; however, advances in sequencing have enabled the collection of multi-omics data, including population counts and metabolomic or metatranscriptomic data. Our main objective is to develop a mathematical model capable of integrating time series of multiomics data at a community scale. We introduce the community metabolic flux analysis (cMFA) method, which generalizes metabolic flux analyses (MFA) , using a list of time series data of experimentally measured production and consumption rates of metabolites and microorganism growth . We aim to infer, for each member of the microbial community, the intracellular distribution of metabolic fluxes by solving the inference problem. We evaluated the cMFA method on synthetic data from dynamic models of increasingly complex microbial communities, based on metabolic models of different mutants of Escherichia coli using dynamic flux balance analysis . Synthetic metatranscriptomic data were obtained from internal metabolic fluxes in the dynamic model. Different regularization terms were tested, including different levels of sparsity, for the selected penalty weight . To evaluate the robustness of the method, multiplebenchmarks were tested. These included assessments of the robustness of the method to data noise, incomplete meta-transcriptomic data, inaccurate prior knowledge of metabolic import rates and larger microbial community. We are currently working with real data, including data on denitrification and cheese production

    ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 16 Oct 2025

    https://hal.science/hal-05318560v1
  • [hal-05304541] Generation of metabolomic-informed models of metabolism for microbial communities

    The generation of genome-wide metabolic networks has become a routine analysis for individual organisms or communities communities. However, these automatically generated metabolic networks are incomplete because they are constructed by based on the combination of gene annotation and reactions available in generic available in generic databases (Metacyc, BIGG, ModelSEED...). These are oriented towards well-known organisms or organisms or model organisms and miss out on important functions secondary metabolism. We propose to combine metabolomic data analysis, metabolic modelling and annotation metabolic modelling and annotation mining to build high-quality models of high quality models of microbial metabolism with the long-term aim of better understanding of microbial communities. In terms of application of the methods to plant microbial communities, we hope that the plant microbial communities, we hope that the newly developed models will provide a better understanding of the process of microbial recruitment by the plant: metabolic functions involved, micro-organisms associated with these functions.

    ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 08 Oct 2025

    https://inria.hal.science/hal-05304541v1
  • [hal-05110984] Advancing agroecology and sustainability with agricultural robots at field level: A scoping review

    Agricultural robots show a growing potential to improve resource management and reduce the environmental impacts of farming. However, the evaluation of robots’ contribution to support sustainable farming is still lacking. This study specifically reviewed the operationalization of four agroecological principles at the field level: recycling, soil health, biodiversity and synergy. To this aim, a scoping review was conducted on the Scopus database, with a query within titles, abstracts, and author keywords mentioning robots, and agroecology or sustainability. The body of literature was screened to include only open field robots. The resulting 78 documents were coded inductively on three macro areas: (1) academic background, (2) robot operations, (3) contribution to agroecology principles, whether explicitly or implicitly mentioned. The results highlight that robots operationalize agroecology principles through non-chemical and selective weeding to preserve diversity and soil health, lighter designs that reduce soil compaction, and advanced data collection systems to optimize resource use and synergy. Solar-powered robots represent early steps toward recycling, but this principle remains understudied. The discussion expands on the potential of robotics in other innovative approaches for sustainable agriculture, such as agroforestry, conservation agriculture, and novel farming system design. Key challenges include ensuring farmers are enabled to master data collection and management, as well as integrating high-tech robotics with low-tech solutions. These efforts are critical for leveraging agricultural robotics to advance agroecology and sustainability across diverse farming systems.

    ano.nymous@ccsd.cnrs.fr.invalid (Mohammad Naim) 13 Jun 2025

    https://hal.science/hal-05110984v1
  • [hal-05304536] Generation of metabolomic-informed models of metabolism for microbial communities

    The generation of genome-wide metabolic networks has become a routine analysis for individual organisms or communities communities. However, these automatically generated metabolic networks are incomplete because they are constructed by based on the combination of gene annotation and reactions available in generic available in generic databases (Metacyc, BIGG, ModelSEED...). These are oriented towards well-known organisms or organisms or model organisms and miss out on important functions secondary metabolism. We propose to combine metabolomic data analysis, metabolic modelling and annotation metabolic modelling and annotation mining to build high-quality models of high quality models of microbial metabolism with the long-term aim of better understanding of microbial communities. In terms of application of the methods to plant microbial communities, we hope that the plant microbial communities, we hope that the newly developed models will provide a better understanding of the process of microbial recruitment by the plant: metabolic functions involved, micro-organisms associated with these functions.

    ano.nymous@ccsd.cnrs.fr.invalid (Coralie Muller) 08 Oct 2025

    https://inria.hal.science/hal-05304536v1
  • [hal-05459251] Modeling Microbial Communities: Toward Digital Twins

    Originally developed for industrial applications, digital twins are now attracting increasing interest in the life sciences. These tools aim to create a digital counterpart of a biological system, enabling data integration, real-time control, and enhanced scientific understanding. This presentation will define the concept of digital twin in the context of microbial ecology. It will address the scientific advances in microbial data acquisition, data analysis, modeling, and microbial engineering that are still needed to obtain effective digital twins of microbial communities. The discussion will be supported by examples of ongoing research in microbial systems modeling.

    ano.nymous@ccsd.cnrs.fr.invalid (Simon Labarthe) 15 Jan 2026

    https://hal.inrae.fr/hal-05459251v1
  • [hal-05283043] Assessing fruit tree vigor in peach and apple orchards through wood segmentation in ground-based RGBimages

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Khac-Lan Nguyen) 25 Sep 2025

    https://hal.science/hal-05283043v1
  • [hal-05340126] Accurate MAG reconstruction from complex soil microbiome through combined short- and HiFi long-reads metagenomics

    Background: Advances in high-fidelity long-read (HiFi-LR) sequencing technologies offer unprecedented opportunities to uncover the microbial genomic diversity of complex environments, such as soils. While short-read (SR) sequencing has enabled broad insights at gene-level diversity, the inherently limited read length constrains the reconstruction of complete genomes. Conversely, HiFi-LR sequencing enhances the quality and completeness of metagenome-assembled genomes (MAGs), enabling higher-resolution taxonomic and functional annotation. However, the cost and relatively low throughput of HiFi-LR sequencing can limit genome recovery, particularly at the binning stage, where coverage depth is critical. Results: Here, we present a novel hybrid strategy that differs from classical hybrid assemblies, where SR and LR reads are jointly used at the assembly step. Instead, we use high-depth SR data to improve the binning of HiFi-LR contigs. Using both SR and HiFi-LR metagenomic data generated from a tunnel-cultivated soil sample, we demonstrate that SR-derived coverage information significantly improves the binning of HiFi-LR assemblies. This results in a substantial increase in the number and quality of recovered MAGs compared to using HiFi-LR data alone and an incomparable improvement compared to SR data alone. Conclusion: Our findings highlight the power of combining SR and LR in highly diverse environments, such as soil, not for hybrid assembly per se, but to enhance the downstream binning process. The combination of SR and LR data substantially improves the downstream binning process and overall genome recovery. Importantly, this approach underscores the potential of leveraging the vast amount of publicly available Illumina metagenomic datasets. Completing existing SR resources with PacBio HiFi sequencing can maximise assembly contiguity and binning accuracy using massive amounts of SR data already generated. This highlights a practical and forward-looking strategy for microbiome research, where novel LR technologies will bring new value to previous short-read efforts.

    ano.nymous@ccsd.cnrs.fr.invalid (Carole Belliardo) 31 Oct 2025

    https://inria.hal.science/hal-05340126v1
  • [hal-05301772] Long-term evolution of forest cover in the Pacific coast of Ecuador (1960–2019): a comparison of Land Use/Land Cover (LULC) remote sensing products

    Ecosystem services provided by forests are increasingly threatened by anthropogenic and climatic disturbances. International initiatives to reduce greenhouse gas emissions from forest disturbances, such as Reducing Emissions from Deforestation and Degradation+ (REDD+), require robust quantifications of the dynamics and extent of Land Use/Land Cover (LULC). However, no study present yet a comparative synthesis of existing LULC products and long-term landscape evolution on the Pacific Slope and Coast of Ecuador (EPSC). In addition, previous studies on the evolution of the forest cover in the EPSC were achieved on small regions and short time-scales, never analysing before the 1990s. In this context, we conducted a long-term study of landscape dynamics at the scale of the EPSC on the last 6 decades (1960-2019). In addition, we propose a comparative synthesis of the main land use databases from remote sensing. To do this, we compared six LULC databases (HILDA+, ESA-CCI, MODIS, GLCLUC, TMF, GFC) derived from remote sensing using the Ecuadorian Ministry of Environment and Water (MAATE) LULC dataset as a reference. This comparison was performed with confusion matrices. Three metrics are calculated from the confusion matrices: Accuracy, F1-score and MCC. HILDA+ and TMF products showed the best agreement with the MAATE map (F1-score of 0.63 and 0.65, respectively). HILDA + captured net forest cover losses better than TMF (65% vs 27% of the net losses recorded by MAATE). Of the six databases analysed, HILDA+ was identified as the product with the best correlation with the Ministry’s LULC maps. Therefore, HILDA+ was chosen to analyse deforestation since 1960 in the EPSC. The major limitation encountered using HILDA+ is the coarse spatial resolution of 1 km. Yet, four deforestation phases were identified in the EPSC over 1960–2019. They reflect the historical, social, political, and climatical context of each ecosystem. Over the entire period (1960-2019), forest cover decreased by 43.9%. Since the 1960s, tropical rainforest areas declined by a third. Dry and transitional tropical forests lost more than half their area.

    ano.nymous@ccsd.cnrs.fr.invalid (Valentine Sollier) 14 Oct 2025

    https://hal.science/hal-05301772v1
  • [hal-05261543] cMFA for multi-omics data integration in microbial community models

    Understanding microbial community functions is challenging due to complex interactions and assembly mechanisms; however, advances in sequencing have enabled the collection of multi-omics data, including population counts and metabolomic or metatranscriptomic data. Our main objective is to develop a mathematical model capable of integrating time series of multiomics data at a community scale. We introduce the community metabolic flux analysis (cMFA) method, which generalizes metabolic flux analyses, using a list of time series data of experimentally measured production and consumption rates of metabolites and microorganism growth. We aim to infer, for each member of the microbial community, the intracellular distribution of metabolic fluxes. This is a high-dimensional constrained linear regression problem, informed by mass conservation constraints and metatranscriptomic data, encoded in the penalty term. The difficulty here is in accurately inferring latent internal rates from a few observations of exchange fluxes. We evaluated the cMFA method on synthetic data from dynamic models of increasingly complex microbial communities, based on metabolic models of different mutants of Escherichia coli using dynamic flux balance analysis (dFBA). Synthetic metatranscriptomic data were obtained from internal metabolic fluxes in the dynamic model. Different regularization terms were tested, including different levels of sparsity, for the selected penalty weight . To evaluate the robustness of the method, multiple benchmarks were tested. These included assessments of the robustness of the method to data noise, incomplete meta-transcriptomic data, inaccurate prior knowledge of metabolic import rates and expanding the study to a larger microbial community . Currently, we are working with real data ,including data on denitrification and cheese production .

    ano.nymous@ccsd.cnrs.fr.invalid (Sthyve Junior Tatho Djeanou) 15 Sep 2025

    https://hal.science/hal-05261543v1
  • [hal-05260643] Spectral indices in remote sensing of soil: definition, popularity, and issues. A critical overview

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 15 Sep 2025

    https://hal.inrae.fr/hal-05260643v1
  • [hal-05281103] Evaluating the potential of Sentinel-2 data to assess the coarse fragment cover of the soil surface within a Spanish vineyard

    The presence of coarse fragments (CF) on the soil surface is a critical factor influencing the assessment of key soil properties such as hydraulic conductivity and C stocks, as well as erosion processes [1–3]. This study investigates the potential of Sentinel-2 (S2) data to estimate soil surface CF cover for an 82-ha trellis-trained vineyard (Burgos, Spain), with ~3 m-inter-row spacing. CF cover (%) was estimated using the point-count method via SamplePoint [4], based on nadir photos taken ~1 m height above ground level, at 60 points repeatedly during three field campaigns. Based on two S2 time series (Jan 2023–Feb 2024 and Jan–Apr 2023 (vine dormancy)), six spectral indices computed within a 30 m-buffer were clustered through hierarchical agglomerative clustering (HAC) and principal component analysis (PCA), which led to the selection of the Non-photosynthetic vegetation soil separation index (NSSI). Assessment of NSSI relevance relied on correlating NSSI values, extracted from S2 images closest to field campaign dates, with the average CF cover, with and without applying an NDVI threshold of 0.4. A Random Forest algorithm was then used to predict CF cover, with 70% calibration 30% validation split repeated over three random iterations. Two approaches were tested, with and without NDVI threshold: (1) S2 bands only, and (2) S2 bands + NSSI + NDVI. NSSI was moderately correlated with CF cover (R² = 0.47–0.60), while best correlated with NDVI threshold (R² = 0.48–0.77). Calibration performance was good across all models (R²>0.6; RMSE<16.75%; RPD>1.62; RPIQ>2.23), even though validation results were variable. NDVI thresholding alone did not improve validation, but adding NSSI+NDVI as predictors enhanced validation accuracy. The best performance was obtained by combining data from all campaigns using S2 bands + NSSI + NDVI without any NDVI threshold (R² = 0.42; RMSE = 17.53%; RPD = 1.55; RPIQ = 2.05).

    ano.nymous@ccsd.cnrs.fr.invalid (Hayfa Zayani) 24 Sep 2025

    https://hal.science/hal-05281103v1
  • [hal-05281203] Detection of soil management practices using Sentinel-1 time series: the challenges raised by the diversified management sequences in vineyards

    Characterization of soil management strategies in the complex agroecosystems of vineyards is crucial to evaluate their impact on vineyards soil health, particularly in the context of increasing soil threats posed in semi-arid environments [1], [2], [3], [4]. This study evaluates the potential of Sentinel-1 (S1) radar times series to detect soil management practices in Spanish vineyards. Two trellis-trained vineyards plots (ca. 4 ha each) located in the Toledo province (Spain) were studied, each subjected to a distinct soil management practice: conventional tillage (TILL) and a cover cropping system (CC), respectively. A farmer survey was conducted to thoroughly document the sequence, timing, and spatial distribution of management operations carried out between October 2020 and August 2024. A methodology based on S1 radar signal change detection was applied to detect soil surface roughness associated with these practices. The survey data served as a reference to evaluate the accuracy of the S1-derived detections. Results revealed a very high degree of variability in vineyards management practices, in terms of type, spatial distribution and frequency within these fields. Despite such diversified management sequence, satellite-based detection was effective on average, over more than 60% of the plot surface area, for tillage for both TILL and CC plots, weed control and rolling for only CC plot. Additionally, mechanical pruning was successfully detected in the TILL plot. Our further research will explore the integration of S1 radar data with S2 optical imagery to refine this detection and assessment of soil management practices in viticultural systems.

    ano.nymous@ccsd.cnrs.fr.invalid (Hayfa Zayani) 24 Sep 2025

    https://hal.science/hal-05281203v1
  • [hal-05285538] Spatial prediction of soil properties using Sentinel-2 temporal mosaics of non-vegetated soils in a semi-arid region: A comparative evaluation of Google Earth Engine and THEIA platforms in Sminja

    This study investigates the potential of Sentinel-2 (S2) temporal mosaics (TM) of non-vegetated soils for enhancing soil property mapping in the semi-arid Sminja Plain, Tunisia (480 km²). Utilising data from 2019 to 2023 across all seasons (autumn, spring, summer, and winter), we generated TM through the Google Earth Engine (GEE) and THEIA platforms. This comparative evaluation highlighted the importance of platform selection and seasonal considerations in remote sensing-based soil property predictions. Non-vegetated soils were isolated using thresholds of NDVI < 0.35 and NBR2 < 0.09 to maximise non-vegetated soil extraction. Key soil properties analysed through a dataset of 215 sample locations regularly spread over the area included electrical conductivity (EC), soil organic carbon (SOC), pH, base saturation (BS), granulometric fractions, and soil moisture content. Random forest (RF) models with K-fold cross-validation assessed the predictive performance, evaluated using RMSE, RPD, and RPIQ metrics. Results indicate that both GEE and THEIA platforms effectively predicted (with THEIA having a very slight edge) most of the soil properties (SOC, CaCO₃, Ca, base saturation, granulometric fractions, and soil moisture content) with RPIQ values exceeding 1.7, while predictions for pH, EC, K, Na, and P₂O₅ were poorly reliable with RPIQ < 0.8. This pinpointed the limitations of the generated RF models for certain soil properties in such environments. Seasonal variations slightly influenced model accuracy, underscoring the importance of platform selection and temporal considerations in remote sensing-based soil property prediction. These findings offer valuable insights for sustainable land management and agricultural planning in semi-arid regions.

    ano.nymous@ccsd.cnrs.fr.invalid (Mukhtar Adamu Abubakar) 26 Sep 2025

    https://hal.science/hal-05285538v1
  • [hal-05285601] Improving the prediction of soil organic carbon content using field-acquired hyperspectral data by accounting for soil moisture and surface roughness

    Soil surface conditions such as moisture, roughness, and vegetation complicate accurate Soil Organic Carbon (SOC) prediction by altering spectral reflectance. Most studies consider these factors separately and under controlled conditions. Soil roughness has rarely been included [1,2], and typically not alongside soil moisture, which has mostly been studied in laboratory settings [3]. Common methods to reduce moisture effects on spectra, such as external parameter orthogonalization (EPO) and direct standardization (DS), rely heavily on lab-based datasets [3]. To address this, we assessed the influence of soil moisture and surface roughness as co-variables in models predicting SOC content from reflectance spectra of bare Luvisols near Versailles, France. Spectral data were collected under natural light at 76 points, along with volumetric soil moisture (θ) and 7 roughness indicators from photogrammetry [4]. SOC was predicted using Partial Least Squares Regression (PLSR) and Random Forest (RF), with 4-fold cross-validation repeated 10 times. Six wavelength-selection (WS) strategies were tested: two from satellite simulations (EnMAP, Sentinel-2), two from model variable importance (PLSR, RF), one expert-based, and one using all wavelengths. Moisture and roughness were added individually. In-field spectra enabled reasonably accurate predictions, with RF outperforming PLSR (SOC RMSE: 1.6–1.8 g.kg⁻¹). WS methods improved accuracy only when co-variables were added. Moisture had little effect, while roughness improved prediction quality in most cases, especially shadow percentage for PLSR and the semivariogram sill parameter for RF. These results highlight the benefit of including surface roughness to improve large-scale SOC prediction from remote sensing.

    ano.nymous@ccsd.cnrs.fr.invalid (Hugues Merlet) 26 Sep 2025

    https://hal.science/hal-05285601v1
  • [hal-05285648] Spectral models learn the context, not the soil: rethinking soc prediction from lab to drone measurements under field conditions

    Predicting soil organic carbon (SOC) using spectral data remains a challenge in digital soil mapping, particularly under field-scale conditions where environmental factors (e.g., vegetation, moisture) can mask or distort soil reflectance [1]. These unstable conditions are a major obstacle to model generalization across space and time. In this study, we evaluated the ability of SOC prediction models, built from reflectance spectra from lab to field and drone measurements, to account for and generalize across varying environmental conditions, over a unique field plot structure located in Nouzilly (France). The experimental design consists of 3 replicates (block design) of 4 to 5 tillage practices (modality) within a single 11.25 ha field. Two sampling campaigns (Oct 2024, May 2025) provided SOC (0-5 cm) and spectral data from lab, field, and UAV platforms at 75 sampling points. Co-variables such as moisture content and soil surface roughness were also collected. To assess model generalizability to new spatial and temporal conditions, we applied several data-splitting strategies: random splits, leave-one-block-out, leave-one-modality-out, and time-based splits between the October and May datasets. Our results show that tillage modality alone induced significant SOC variability at the soil surface, with mean SOC ranging from 12.1 g/kg under conventional tillage to 16.7 g/kg under minimum tillage. Seasonal differences between October and May also contributed substantially to SOC variability, further complicating model generalization. In this context, co-variables related to soil roughness and moisture had no significant impact on improving model accuracy. Model performance was highly sensitive to data-splitting strategy. Random splits gave overly optimistic results (R² = 0.75, RPIQ = 2.7 for field spectra), whereas leave-one-modality-out failed to generalize to unseen tillage practices, with most models showing R² < 0. Leave-one-block-out yielded reliable performance for laboratory spectra but failed for UAV and field data, especially under reduced environmental variability (e.g., in May or after NDVI-based filtering), with R² dropping from 0.72 to 0.28 for October UAV measurements. These findings suggest that models often rely on indirect or ephemeral environmental features rather than direct or intrinsic spectral behaviour of bare soil resulting in unstable performance and poor transferability across space and time, even for similar soils.

    ano.nymous@ccsd.cnrs.fr.invalid (Hugues Merlet) 26 Sep 2025

    https://hal.science/hal-05285648v1
  • [hal-05260506] Bare soil mosaicking optimisation for soil organic carbon prediction in Centre-Val de Loire

    [...]

    ano.nymous@ccsd.cnrs.fr.invalid (Qianqian Chen) 15 Sep 2025

    https://hal.inrae.fr/hal-05260506v1