Preliminary exploratory data analysis for the discrimination of vicia samples using UV and FT-IR spectroscopy
The UV absorbance spectra of the methanol extracts of the ten Vicia samples were measured in the range of 200–400 nm, and the absorption bands appeared in the spectral range between 216 and 384 nm (Fig. 1S). The UV absorption bands of the different Vicia methanolic extracts are likely due to the existence of different UV active chromophores, such as aromatic, carbonyl, and various conjugated systems, in the Vicia phytochemicals that undergo π,π* and n,π* transitions63. In this study, the maximum UV absorbance (λmax) of Vicia samples was observed at 276 nm. Our observation is similar to previous couple of studies that reported UV absorbance maxima at 276 nm for fava beans crude extract, its low-molecular weight phenolic fraction, and another condensed tannin fraction21,64. This can be partially attributed to the abundance of complex array of phytochemicals in Vicia seeds, which have a UV maximum close to 276 nm and are mostly composed of phenolic acids, flavonoids, condensed tannins, alkaloids, and jasmonates4,8,21,33,34,64,65,66,67. The majority of secondary metabolites in Vicia seeds are phenolic acids and polyphenols. Most Phenolic acids of Vicia seeds are classified into two types: hydroxycinnamic acid derivatives and hydroxybenzoic acid derivatives. The most common hydroxycinnamic acids in Vicia seeds with their UV absorbance maxima (λmax) are the ferulic (λmax 218, 236, 285, 300), coumaric (λmax 226, 285, 305) chlorogenic, caffeic (λmax 220, 240, 294, 326), sinapic (λmax 238, 322) acids4,8,33,34,65,66,67,68,69. While the most prevalent hydroxybenzoic acids in Vicia seeds include P-Hydroxybenzoic acid (λmax 255), protocatechuic (λmax 260, 295), Protocatechuic aldehyde (λmax 280,311), syringic (λmax 276), vanillic (λmax 261, 294), vanillin, gallic (λmax 272), and salicylic (λmax 231, 296) acids4,8,21,33,34,65,66,67,68,70. Fava beans and other Vicia seeds are rich in various flavonoid classes. The most abundant flavonols, including quercetin (λmax 255, 370), kaempferol (λmax 266, 367), myricetin (λmax 254, 374), isorhamnetin (λmax 253,370), and rutin (λmax 259, 359) as well as the flavan -3-ols such as catechin (λmax 279) and epicatechin (λmax 279) and their gallate derivatives (λmax 274)4,6,8,20,21,33,34,64,65,66,67. The flavones, including apigenin (λmax 267, 296, 336), luteolin (λmax 253,267,349), naringenin (λmax 289, 326), and vitexin (λmax 270, 335) are also present in Vicia seeds. Furthermore, the isoflavones, such as genistein (λmax 261) and daidzein (λmax 249, 303) have been reported to be found in Vicia seeds in a lesser amount. Moreover, chalcones, such as isoliquritigenin (λmax 258, 298, 367) and phloretin have been isolated or detected in Vicia species4,6,8,20,33,34,65,66,67,71,72. In addition, Vicia seeds are also a significant source of polyphenolic compounds, notably condensed tannins (proanthocyanidins) such as procyanidin and prodelphenidin and their derivatives with λmax of 276–279 4,8,21,64–66. Among the major nitrogenous compounds that have been reported in Vicia seeds are vicine and convicine, the chief alkaloids in Vicia seeds, with λmax 275 and 271 respectively4,6,8,73. In addition, Vicia seeds contain many nutritive amino acids, such as tryptophan, tyrosine, and phenylalanine, among others, and their bioactive derivatives, such as L-dopa, which may elicit UV absorption features in the range of 257–280 nm17,74. Regarding the jasmonate class, a handful of phytochemicals have been identified in Vicia seeds, including jasmonic acid, Wyerone, wyerone epoxide, tuberonic acid, and ethyl jasmonate with λmax around 220 and 290 4,8,67,75,76,77.

PCA score plots resulted from preliminary exploratory data analysis of (a) UV spectra and (b) IR spectra of 10 Vicia seeds samples.
Preliminary exploratory data analysis was performed on the average absorbance of three replicates of 10 samples versus 163 variables representing the UV absorbance in the region of 200–400 nm. Each of them represents the UV spectrum for one of the eight cultivars of Vicia faba species, as well as the two other Vicia species, Vicia sativa and Vicia monantha. To assess the variation between the UV spectra of the ten different samples of Vicia seeds, principal component analysis (PCA) was applied using the full cross-validation method after mean centering of the UV data. PCA is an unsupervised technique for data reduction that creates a visual scatter plot known as a score plot. This plot allows for a qualitative assessment and visualization of the grouping, patterns, similarities, and variability among the samples. The resultant PCA score plot (Fig. 1a) was successful in clearly segregating the 8 samples of fava bean seeds from the two samples of Vicia sativa and Vicia monantha. The first two principal components, PC1 and PC2, explained 99% of the total variation of the data. From the scatter score plot, it was found that the samples of the eight different varieties of fava beans were separated and positioned at the left (negative) side along PC1, while Vicia sativa and Vicia monantha samples were located at the right (positive) side along PC1. These results suggest that Vicia sativa and Vicia monanta exhibit a greater degree of similarity in their UV spectra compared to Vicia faba. In addition, the sample of Vicia sativa was separated from the sample of Vicia monantha along the PC2, which explains only 4% of the total variation in data. This finding also confirms the high degree of resemblance between the UV spectra of Vicia sativa and Vicia monantha. Furthermore, there were 3 clusters within the Vicia faba samples along PC1 and PC2. The first cluster represents the Spanish cultivar LUZ sample, the second cluster represents the two new Egyptian cultivars Maryout2 (MRA) and Maryout 3 (MRB), and the third cluster contains the samples of the five traditional Egyptian cultivars: Sakha 1 and 4 (SKHA and SKHB), Giza 716 and 843 (GZA and GZB), and Masr (MSR). This interesting finding suggests the potential application of UV spectroscopy not only in the discriminations of fava bean samples from other Vicia species (VS and VM) but also in the discrimination between at least some of the varieties within the same species of Vicia faba.
Regarding vibrational FT-IR spectroscopy, Fig. 2S presents the FT-IR spectra of ten different samples of Vicia seeds in the mid-IR region (4000–400 cm-1). While all spectra display similar overall spectroscopic profiles, there is significant variability in spectral amplitudes across samples, which was largely eliminated by applying the SNV algorithm. FT-IR is a valuable technique for identifying the functional groups present in the analyzed samples. The FT-IR spectra of all samples exhibited characteristic peaks that were indicative of various functional groups. A broad peak observed at approximately 3280 cm-1 corresponded to OH stretching, while absorptions at ~ 2927 and 2850 cm-1 were attributed to the asymmetric and symmetric stretching vibrations of methylene (-CH2) groups. Additional peaks were assigned to C ≡ N stretching at ~ 2225 cm-1, O-C = O stretch at ~ 1735 of triglycerides, C = O stretching at ~ 1640 cm-1 for amides or other compounds containing carbonyl groups, N-H-C = O at ~ 1540 cm-1 for amide II in protein, OH bending at ~ 1390 cm-1 for phenols, C-C stretching or C-O bonds of polysaccharides at ~ 1230 cm-1, C-O stretching of polysaccharides or C = C bending at ~ 1000 cm-1 (aromatic rings of cellulose), and -C = O bending at 850 cm-1. The main spectral peaks were ascribed to a variety of chemical components, such as water, proteins, polysaccharides, and lipids. These results are in accordance with previous studies78,79,80.

PCA score plot (a) and HCA dendrogram (b) of UV spectra of 40 samples of Vicia sativa, Vicia monantha (n = 8), and eight varieties of Vicia faba (n = 32).
PCA exploratory analysis was also conducted on FT-IR spectroscopy data belonging to the ten samples of Vicia seeds. The FT-IR absorption spectral data of the ten Vicia samples in the region of 4000–400 cm-1 underwent preprocessing using the Standard Normal Variate (SNV) algorithm to eliminate or reduce the scatter effects including the baseline shift and multiplicative effects arising from particle size and packing differences, followed by mean centering prior to PCA application. PCA score plot for the FT-IR spectra was presented in Fig. 1b. The first two principal components (PC1 and PC2) accounted for 69% and 8% of the total variation in the FT-IR spectroscopy data, respectively. Similar to the findings with UV spectra, FT-IR spectra effectively discriminated between the Vicia faba samples and the other Vicia species. The Vicia faba samples clustered on the left (negative) side of the PC1 axis, while the two samples of Vicia sativa and Vicia monantha were positioned on the right (positive) side, indicating clear separation based on PC1. The two samples of Vicia sativa and Vicia monantha were further separated along PC2, even though PC2 accounted for only 8% of the total variance. This finding supports the notion of greater similarity between the IR spectra and chemical components of VM and VS, as previously observed with UV spectra. However, the FT-IR spectra demonstrated limitations in discrimination between the samples of varieties within the Vicia faba species, compared to the capabilities of UV spectra. The various fava bean varieties were clustered into only two clusters on the left half of the score plot (compared to 3 clusters in the case of UV spectra): one cluster represents all of the traditional and new Egyptian fava bean varieties, and the other cluster represents a sample of the Spanish variation Luz de Otoño. While no prior research has employed UV spectroscopy, a limited number of studies have utilized FT-IR and NIR spectroscopy to qualitatively discriminate between fava bean cultivars or the growing location/season of the fava bean samples. Johnson and coworkers employed FTIR to rapidly profile phytochemical variations between ten cultivars of Australian fava beans. They constructed a Partial least squares discriminant analysis (PLS-DA) model that was only capable of classifying the fava bean samples based on the growing year with accuracy (> 87%). Attempts to classify the fava bean samples according to the growth site using PLS-DA were less successful (59% accuracy)79. The same research group explored the potential application of FT-IR and NIR spectra for the prediction of antioxidant activity and key chemical components in Australian fava bean varieties. Firstly, None of the FT-IR models yielded satisfactory results for any investigated parameter. Secondly, NIR models could not predict most of the analytes except protein content, alongside rapid approximation or prediction of samples with high versus low phenolics and antioxidant capacity80. Combination between FT-IR absorption bands for proteins and polysaccharide, in conjunction with the mineral contents measured by ICP-MS (inductively coupled plasma mass spectrometry) was successful to discriminate white varieties from green varieties of Chinese Fava beans52,81.On the other hand, the principal component analysis (PCA) application to the FT-IR bands of only the protein or carbohydrate regions partially discriminated between Western Canadian fava bean varieties to some extent, while cluster analysis showing partial separation between “low tannin and regular tannin-containing” varieties52,82. Using NIR spectroscopy was more promising in identifying fava bean cultivars grown in various locations across China, based on spectral characteristics pertaining to protein, starches, oil and polyphenols83. Regarding the discrimination between different Vicia species, Fayek and colleagues, in a remarkable study, have used an untargeted metabolomics approach based on UPLC-MS metabolite profiling to discriminate between 16 Vicia species, including Vicia faba and Vicia sativa. Their findings align with our study’s results, demonstrating the effectiveness of PCA score plots based on UPLC-MS metabolite profiling data in discriminating Vicia faba from the other 15 Vicia species, including Vicia sativa and others4. Our study showed a remarkable similarity between Vicia sativa and Vicia monantha in both UV and FT-IR spectra and a clear separation from the Vicia faba samples spectra. In the aforementioned study, most of the studied species, including Vicia sativa and other species (12 out of 16 species), clustered together and failed to separate in the PCA score plot, indicating a similar metabolome between most Vicia species, and only Vicia faba and three other species were successfully separated and have shown distinctive metabolite profiles from the other 12 Vicia species (Vicia sativa and others)4. These findings suggest that UV and FT-IR spectroscopy could serve as viable alternatives to UPLC-MS for discriminating Vicia faba from other Vicia species, offering advantages such as lower costs, easier preparation and operation, and simpler data acquisition and analysis compared to UPLC-MS data.
Based upon our preliminary exploratory data analysis of both UV spectra and FT-IR spectra, it seems that UV spectroscopy exhibited superior discriminatory capabilities compared to FT-IR spectra. Consequently, we opted to proceed with UV spectra for the development of more detailed unsupervised discrimination and clustering models, as well as supervised SIMCA and PLS-DA classification models, particularly for discriminating among some of the more closely related varieties within the same species, Vicia faba.
Building unsupervised PCA and HCA models based on the UV-spectroscopy of vicia seeds
The methanolic extracts of forty Vicia seed samples, comprising four samples each of Vicia sativa and Vicia monantha, and thirty-two samples distributed across eight fava bean varieties (four samples per variety), were analyzed for their UV absorbance spectra in the 200–400 nm range (Fig. 3S). The resulting data were subjected to unsupervised clustering techniques, including principal component analysis (PCA) and hierarchical cluster analysis (HCA), following mean centering. The PCA score plot (Fig. 2a) revealed a clustering pattern, similar to the results of previous preliminary exploratory analysis, displaying five well-separated clusters. Three of these clusters were closely grouped on the left (negative) side of the plot, representing the five commercial traditional Egyptian fava bean varieties, the Spanish variety Luz de Otoño, and the new Egyptian varieties Maryout 2 and 3, respectively. The remaining two clusters were located on the right (positive) side of the plot and corresponded to Vicia sativa and Vicia monantha.

The PCA model of the eight fava bean varieties based on the UV spectra of the training set (n = 32 samples) (a). This model was challenged by 10 non-fava bean samples from Vicia sativa (VS) and Vicia monantha (VM) and all of them clustered away from the training set samples (blue dots, n = 32) and appear as outliers (green dots, n = 10) in (b). This model was also challenged by a validation set (n = 25, the green dot) representing the eight fava bean varieties and all of them clustered correctly with their respective cluster (c).
Moreover, hierarchical cluster analysis (HCA) was applied to classify the samples based on the similarities and differences among their UV spectral data. The resulting HCA dendrogram (Fig. 2b) revealed a clear division of all Vicia seed samples into two main clusters. The first cluster was further divided into two subclusters, each corresponding to Vicia sativa and Vicia monantha, respectively. The second main cluster was also subdivided into three subclusters: one for the five traditional Egyptian fava bean varieties, another for the two new Egyptian varieties Maryout 2 and 3, and a third for the Spanish variety Luz de Otoño. The clustering pattern observed in HCA corroborated the findings of PCA, supporting two key conclusions. Firstly, a greater similarity was evident between Vicia sativa and Vicia monantha, in contrast to their clear dissimilarity with Vicia faba. Secondly, on one hand, a notable difference was observed between the traditional commercial Egyptian fava bean varieties and the Spanish variety Luz de Otoño. On the other hand, both the traditional Egyptian fava bean varieties and the Spanish variety were clearly distinguishable from the two new Egyptian varieties, Maryout 2 and 3.
To further assess the effectiveness of UV spectra in conjunction with multivariate statistical models for identifying and classifying the three previously defined classes of Fava bean samples, as well as discriminating them from non-fava bean samples like Vicia sativa and Vicia monantha, a series of models were developed. A training set consisting of the previously employed UV spectra of 32 samples (four samples per each variety), including 4 samples for the Spanish variety Luz de Otoño, 8 samples for the two new Egyptian varieties Maryout 2 and 3, and 20 samples for the five traditional commercial Egyptian fava bean varieties, was subjected to principal component analysis (PCA) to construct a specific PCA model for the varieties of Vicia faba species. As anticipated, the PCA model trained exclusively on fava bean samples (Fig. 3a) effectively separated these samples into three distinct clusters: one cluster for the five traditional Egyptian fava bean varieties, another cluster for the two new Egyptian varieties Maryout 2 and 3, and a third cluster for the Spanish variety Luz de Otoño. Subsequently, this PCA model was challenged with 10 non-fava bean samples from Vicia sativa and Vicia monantha, which were all identified as outliers on the PCA score plot (Fig. 3b). Furthermore, the trained PCA model was challenged with a validation set comprising 25 samples representing the eight fava bean varieties (Fig. 3c). Each of the 25 test samples was accurately clustered with its corresponding cluster within the training set samples, apparently demonstrating the model’s robust discrimination power. Our findings revealed that the UV spectra of the five traditional Egyptian varieties samples are more similar to one another than to the new Egyptian varieties Maryout 2 and 3, as well as the Spanish variety Luz de Otoño. In this regard, previous comparative metabolite profiling based on LC-MS analysis conducted by Mekky and coworkers on the seeds and sprouts of three traditional Egyptian fava bean varieties, including Giza 834, Sakha 3, and Nubaria 3, revealed a remarkable similarity in their qualitative chemical profiles8. On the other hand, fava beans have been shown to exhibit significant genetic variation in terms of seed composition, size, and floral biology2,13,30. The composition of major polyphenol groups was investigated in ten varieties of immature fava bean seeds cultivated in Chile, including Luz de Otoño and nine others. Their study identified significant differences among these varieties, highlighting the ample phenotypic variability available for future selection studies focused on traits such as nutritional value, taste, and ease of production. Moreover, the later study also revealed an interesting finding about Luz de Otoño, the Spanish variety, which exhibited the lowest concentration of total phenolics and the highest levels of condensed tannins among all the studied varieties2. Regarding the new Egyptian Maryout varieties, few previous studies comparing them (in some traits) to commercial traditional Egyptian varieties have revealed differences in certain traits like morphological characteristics, mean seed yield, and protein content84,85. Further chemical investigations are warranted to comprehensively elucidate the distinctions between these varieties.
Building supervised SIMCA and PLS-DA predictive models for classification of fava bean seeds
To further investigate the previous results concluded by the unsupervised model of PCA, the supervised pattern recognition methods SIMCA and PLS-DA were employed to build predictive classification models. The Soft Independent Model of Class Analogy (SIMCA) technique is a pivotal chemometric tool capable of categorizing samples into pre-established groups, assigning new objects to the class exhibiting the greatest similarity. SIMCA is strongly based on PCA because each class is defined by an individual PCA. The SIMCA classification process comprises two distinct phases: the training stage, wherein individual models of the classes are constructed, and the testing or validation stage, during which new samples (not used in the training phase) are categorized within the established class models to assess the model’s efficiency. In our study, during the training phase, 3 distinct classes were established using independent PCA models for each single class. These classes represented the five traditional Egyptian fava bean varieties (20 samples, 4 samples per variety), the Spanish variety Luz de Otoño (4 samples), and the two new Egyptian Maryout 2 and 3 varieties (8 samples, 4 samples per variety), respectively. Subsequently, a validation set composed of 25 samples representing each of the eight fava bean varieties and 10 samples from non-fava bean species (Vicia sativa and Vicia monantha) were used. As shown in the confusion matrix for SIMCA classification in the upper half of Table 2, the model achieved 100% classification accuracy for all three classes, with no misclassifications observed between the Egyptian five traditional fava bean varieties class, the Spanish variety Luz de Otoño class, or the two new Egyptian Maryout 2 and 3 varieties class. Crucially, the model correctly rejected all 10 non-target Vicia sativa and Vicia monantha samples during validation phase, demonstrating excellent specificity. Further details on the classification of validation samples (not used in model training) are reported in the SIMCA classification table (Table 1S). It shows that the 25 validation samples representing different fava bean varieties were correctly classified as members of their corresponding classes. Conversely, all the 10 non-fava bean samples from Vicia sativa and Vicia monantha were not assigned to any of the 3 fava bean classes. Each sample is assigned to a certain class based on metric distances unique to each class, such as Si and Hi, which estimate sample-to-model distance and sample farness from the model center (leverage). Three Si vs. Hi plots in Fig. 4a, b, and c were used to evaluate the classification results, where in case a sample belonged to a certain class, it should fall within the class membership limit, on the left below the horizontal line. The validation samples representing traditional commercial Egyptian fava bean varieties, as well as the new Egyptian varieties Maryout 2 and 3, and the Spanish variety Luz de Otoño, were all found to lie within the membership boundaries with small distance and leverage from their respective models, demonstrating the high sensitivity and predictability of the model. Moreover, the SIMCA model showed good specificity, as all non-fava bean samples of Vicia sativa and Vicia monantha were not classified into any of the three classes and appeared as very outliers at the upper right quadrant of the Si vs. Hi plots (Fig. 5a, b, and c). Additionally, the model distance between each pair of models was estimated to clarify the model’s discriminative potential to discriminate the spectral signals of the 3 classes. This provides a measure of how separable the class models are. Good class separation is indicated by a distance greater than three, implying a high likelihood of distinguishing the classes from one another. In this study, it is noteworthy that the class models exhibited considerable differences, resulting in interclass distances of approximately 89 and 32 for the two Maryout varieties class and the Spanish variety class, respectively, when compared to the class of traditional fava bean varieties (see details of model distance in Fig. 4S). Furthermore, the discrimination power for all variables was greater than 2 (most of them had more than 3) between any pair of classes, reflecting the discriminatory capability of the constructed SIMCA model in distinguishing among the three classes of fava beans (See details of discrimination power in Fig. 5S). The ability of the SIMCA model to classify and discriminate between the UV spectra of the 3 classes of fava beans and consider all non-fava bean samples as extreme outliers corroborates and validates the previously constructed unsupervised PCA and HCA models. While the model performed well on this dataset, further validation with larger sample sets, including geographically diverse origins, would strengthen generalizability.

Three Si vs. Hi plots for the validation samples (from the eight fava bean varieties, n = 25) representing the closeness and classification of these samples to one of the three models’ classes of fava bean varieties. The samples that belong to a certain model class lie in the left lower quadrant with a small distance and leverage to the model for which they belong. The validation samples from the five traditional Egyptian cultivars lie in the lower left quadrant of the traditional fava bean class model, while Maryout and LUZ samples lie outside this quadrant with high leverage and/or distance to the tradition Egyptian fava bean model (a). Only Maryout validation samples lie in the lower left quadrant of Maryout class (new Egyptian varieties), while all other varieties lie outside this quadrant (b). Finally, only LUZ validation samples lie inside the lower left quadrant of the Spanish variety Luz de Otoño class whereas all Egyptian verities lie outside this quadrant (c).

Three Si vs. Hi plots shows that all non-fava bean samples (n = 10) of Vicia sativa and Vicia monantha have both high model distance and leverage. So, they do not belong to any of the three fava bean classes and appeared as very outliers at the upper right quadrant of the Si vs. Hi plots of traditional Egyptian fava bean class model (a), Maryout class model (b), and the Spanish variety Luz de Otoño class model (c). The other samples in the figure represent the validation set of fava bean varieties (n = 25).
The supervised discriminant method, partial least squares discriminant analysis (PLS-DA), was implemented to augment the separation between the three classes of fava beans, namely: the five traditional Egyptian fava bean varieties, the two new Egyptian varieties Maryout 2 and 3, and the Spanish variety Luz de Otoño. A PLS-DA calibration model with seven latent variables was created using the training set of the eight fava bean varieties spectral data that were previously used and exploiting the leave-one-out-cross validation (full cross validation method). The score plot representing the first and second latent variables for the calibration set, as depicted in Fig. 6a, demonstrates the attainment of good class separation, characterized by the formation of three distinct clusters along both factors 1 and 2. The samples of the two new Egyptian fava bean varieties appeared at the far right side of the score plot, while the traditional five varieties were located at the left side of the plot, and the Spanish variety Luz de Otoño appeared at the lower middle part of the score plot. Furthermore, Fig. 6b shows the accurate classification and clustering of the validation set samples of fava bean varieties to their correct classes of fava bean. While Fig. 6c shows that the non-fava bean samples from other Vicia species were clustered away from the three classes of fava bean varieties. Similar to what observed with the SIMCA classification results, the lower half of Table 2 presents the PLS-DA confusion matrix, which highlights the PLS-DA model’s strong discriminatory performance among the three classes of fava bean varieties. Figure 6S showed the predicted with deviation plot for all the non-fava bean samples as well as the validation set samples of fava bean varieties. All samples from fava bean varieties within the validation set were accurately assigned to their respective classes, achieving 100% classification accuracy. On the other hand, all 10 samples from Vicia sativa and Vicia monantha were predicted as outliers with very high deviation, demonstrating the high specificity of the PLS-DA model.

The Score plot of PLS-DA model shows how samples are projected onto the first 2 factors and how classes are well separated. The calibration set (blue dots) (n = 32) of fava beans formed three distinct clusters in the score plot, signifying well separated 3 classes along factors 1 and 2 (a). Panel (b) (n = 57) represents the validation unknown samples (green dots, n = 25), which were accurately classified and clustered closely with the training samples (blue dots, n = 32) of their correct class. On the other hand, panel (c) (n = 67) shows that the non-target species samples (green dots VS and VM at the lower right quadrant, n = 10) were clustered away far from the three classes of fava bean varieties.
While the results of our classification models are very promising on the evaluated dataset, their generalizability may be constrained by the relatively small sample size and limited diversity, particularly in geographic origin (e.g., only one Spanish variety was represented). Future studies should extend our model by training and validating it using larger and more varied sample sets, including broader representation across cultivars from diverse geographical origins, to ensure robust applicability. However, our sample size per variety/species aligns with previous chemometric studies that utilized spectroscopy and mass spectrometry data to develop initial methods and models for discrimination among Vicia species and varieties. For instance, in a remarkable study, Fayek and colleagues applied multivariate techniques to LC-MS data for classifying 16 Vicia species from over four European countries using only 3 samples per species4. Similarly, other studies applying IR included 3 samples per variety for 6 fava bean varieties grown in one place in Western Canada82and 10 samples per variety for 10 fava bean varieties grown in two locations in south Australia80. Despite the limited sample size, the models developed in this study achieved excellent classification performance (100% accuracy) when grouping the 8 varieties into three key classes: class1: the five traditional Egyptian fava bean varieties (20 samples training/15 samples validation); class2: the Spanish variety Luz de otoño (4 samples training/4 samples validation); and Class 3: the two new Egyptian varieties of Maryout 2 and 3 (8 samples training/6 samples validation). Crucially, the accurate classification of the Spanish variety provides a proof of concept for the transferability of the model to other global varieties outside Egypt. Moreover, the models excluded all 10 samples of non-fava bean Vicia species with perfect specificity, demonstrating robust discriminative power despite the sample size. Finally, we think that the limited inclusion of non-Egyptian fava bean varieties (only one Spanish variety) in this study, along with the fact that previous research has also largely focused on local varieties with small samples sizes, underscores the need for future multinational collaborative research. Such efforts could evaluate the geographical robustness of discrimination models and enhance their generalizability. Nevertheless, our discrimination and classification models provide significant value in discriminating and classifying some of the very popular competing varieties in the Egyptian market, particularly the traditional Egyptian varieties that are widely used and newer, competing varieties in the Egyptian market, such Maryout 2 and 3, and the Spanish variety Luz de Otoño. These varieties mainly compete in terms of seed and crop yield, nutritional quality (protein content), maturing time, and disease resistance under different environmental conditions10,84,85,86,87,88,89. Furthermore, the classification models developed in this study for Vicia varieties offer several potential applications, such as seed authentication and prevention of adulteration. Certain varieties may be misrepresented, intentionally or unintentionally, during trade or storage. Ensuring the correct identification and exchange of fava bean varieties by farmers and traders is crucial to avoid economic losses, compromised crop performance, and quality issues. Another important application of the classification models of fava bean varieties lies in optimizing agronomic decision-making and plant breeding programs, since some varieties may be more suitable for specific climates, soils, or agronomic practices (e.g., drought resistance, disease tolerance). For example, the new Egyptian varieties of Maryout have been shown to outperform the traditional Egyptian varieties in seed yield and protein content, particularly under drought and rainfed conditions84,85. Another example is the reported yield of the early-maturing Spanish variety Luz de Otoño which is lower than the traditional and new Egyptian varieties84,85,86,87,88. Our study has shown that the new Egyptian varieties of Maryout can be easily discriminated from the traditional Egyptian varieties and the Spanish variety based on rapid, low cost, and simple models of spectroscopic (UV) analysis. A further application of these models is in the food industry and nutrition field, as fava beans and Vicia seeds are considered excellent sources of vegetarian protein and the nutritional value such as protein content varies significantly across varieties. Low-cost and rapid classification of varieties may support the selection of superior variety for food products and nutritional planning. It is also worth mentioning the importance of classification models for Vicia seeds in the preservation of endemic species and the protection of traditional cultivars, which may be at risk of being lost or mixed with commercial lines, especially due to competition from newer and imported varieties in the Egyptian market.
Total phenolics, flavonoids and DPPH radical scavenging activity
The total phenolics, flavonoids, and DPPH radical scavenging activity of the 8 varieties of fava beans, as well as the 2 other Vicia species, were comprehensively summarized in Table 3. The total phenolic content of the analyzed fava bean varieties ranged from 1.88 mg GAE/g extract for the Luz de Otoño variety to 39.88 mg GAE/g extract for the Sakha 4 variety, with an average of 22.07 mg GAE/g extract. Concurrently, the total flavonoid content exhibited variation ranging from 0.57 mg QE/g extract for Luz de Otoño to 11.56 mg QE/g extract for Giza 843 variety, with an average of 7.52 mg QE/g extract. On the other hand, the Vicia sativa and Vicia monantha species demonstrated higher levels of total phenolics and flavonoids compared to all fava bean varieties. These results are consistent with previous studies where a couple of reports by Amarowicz and colleagues have determined the total phenolics in Polish cultivars to be 23.9 and 56 mg GAE/g, respectively21,64. Furthermore, the total phenolics of three traditional Egyptian cultivars of fava beans, including Nubaria3, Giza843, and Sakha3, were estimated to be in the range of 21.8 mg GAE/g for Nubaria to 42.36 mg GAE/g for Sakha3 8. Moreover, a couple of studies have determined the range of total phenolics in a large number of Tunisian genotypes of fava bean seeds to be 16.98 to 67.47 mg GAE/g and 10.9 to 19.86 mg GAE/g, respectively12,13. It is also worth mentioning that the Spanish variety Luz de Otoño scored the lowest levels of both phenolics and flavonoids in comparison to the Egyptian varieties in our study. A previous study corroborates this observation, reporting a total phenolic content of 0.82 mg in the fresh immature seeds of Luz de Otoño. Moreover, the Luz de Otoño variety exhibited the lowest total phenolic content among the ten fava bean varieties from Chile, Syria, and Spain examined in the same study2. In addition, a previous study reported the total phenolic content of Vicia sativa and Vicia monantha to be 67.35 and 76.37 mg/g, respectively90. Regarding flavonoids in previous studies of fava beans, it was estimated in a couple of studies on a large number of Tunisian genotypes to be in the range of 5.25 to 6.96 mg QE/g and 5.19 to 9.3 mg RE/g, respectively12,13. On the other hand, it was reported that the total flavonoid content of Vicia sativa and Vicia monantha was much greater, at 47.34 and 65.23 mg/g, respectively90. The prevalence of phenolic compounds in plant species, particularly in legumes, is well-established and contributes substantially to their antioxidant capacity91. Our findings corroborate the pivotal role of these phenolic compounds in augmenting the health-promoting properties of Vicia seeds. The observed variations in the levels of phenolics and flavonoids among various fava bean varieties have been documented in previous studies, highlighting the intricate interplay between genetic and environmental factors in determining the production of these compounds in different fava bean genotypes12,13,92.
The antioxidant capacity of Vicia faba seeds, as measured by DPPH radical scavenging activity, exhibited a low antioxidant capacity (percentage radical scavenging activity %RSA) ranging from 2.81% for Luz de Otoño to 25.05% for G843 at a concentration of 100 µg/ml. However, the %RSA notably enhanced with increasing the concentration of extract to be in the range of 40.91% for Luz de Otoño to 88.79% for Giza 843 variety at 2 mg/ml. Among the fava bean varieties, Giza 843 demonstrated the most potent antioxidant activity with an IC50 value of 316.02 µg/ml, whereas Luz de Otoño exhibited the least antioxidant capacity with an IC50 value of 3232.52 µg/ml. Our results comply with previous reports8,21,64 which determined the antioxidant capacity of different fava bean cultivars. Mekky and colleagues determined the percentage radical scavenging activity of three different Egyptian fava bean cultivars at 100 µg/ml to be less than 25% with the highest value assigned to Giza 843 followed by Sakha3 and finally, Nubaria3. The antioxidant capacity of the methanolic extract of fava bean seed coat was reported to be higher than our results, with values of 44.28% and 61.05% at concentrations of 100 and 200 µg/ml, respectively93. Moreover, according to a previous study, fava bean pods showed superior antioxidant capacity compared to our results, with IC50 of 87.35 µg/ml and DPPH scavenging percentage of 65.7 at 250 µg/ml56. It is worth mentioning that the total phenolics in these studies were much higher than ours, which can account for the differences in antioxidant capacity. Furthermore, the variation of phenolic composition is not only dependent on genotype and environmental factors but is also influenced by the maturity stage and the used part of the plant (i.e., pods vs. seeds)13. In addition, the antioxidant capacity of both Vicia sativa and Vicia monantha was higher than any fava bean variety which can be attributed to the high phenolic content of these two species. In previous literature, a powerful antioxidant capacity has been reported for the ethanol extract of Vicia sativa23. Moreover, the polyphenol extract of Vicia sativa was superior to soybean and butylated hydroxytoluene in scavenging DPPH radicals26. The obtained results could be attributed to the presence of natural antioxidant phytochemicals like phenolics and flavonoids in Vicia seeds. These phytochemicals possess multiple hydroxyl groups in their molecular structure that can reduce or neutralize DPPH radicals through multiple mechanisms. This free radical scavenging activity might be valuable not only for promoting health and preventing disease but also in preserving foodstuffs, pharmaceutical products, and cosmetics94.
Phytochemical variation among Vicia cultivars and species, such as the variation in phenolics and flavonoids content, may influence their bioactivity and health-promoting capabilities. Interestingly, our discrimination models are beneficial in discriminating certain species, such as Vicia sativa and Vicia monantha, which are rich in phenolics and flavonoids from other Vicia faba varieties. Furthermore, they offer significant value in discriminating the Spanish fava bean variety Luz de Otoño, which has a lower content of phenolics and flavonoids compared with the Egyptian varieties. Rapid discrimination and classification models based on low-cost UV spectroscopic analysis supports the field of developing phytopharmaceuticals and functional foods by helping to avoid varieties with low levels of these compounds (e.g., Luz de Otoño) and select the varieties and species that are higher in these compounds. Future studies should expand on optimization of these models for the field of developing phytopharmaceuticals and plant-based health products. The phytochemical parameters analyzed in this study are highly relevant to research in food science, nutritional analysis, functional foods and phytopharmaceuticals development. Moreover, our research may be the starting point for selecting which Vicia variety or species might be used in health-related problems. For example, a recent study selected the fava bean cultivar “Sakha 3” for neuroprotective and anti-inflammatory evaluation in Parkinson model18 based on prior estimation of phenolic content of three Egyptian cultivars in an earlier study by the same group8demonstrating how phytochemical estimation, such as ours, can guide variety selection for health-related applications research. Furthermore, the results of phytochemical parameters in this study might guide the food and pharmaceutical industries in the rational selection of the proper Vicia species or fava bean variety for development of novel functional foods or phytopharmaceuticals, with the traditional fava bean cultivars Sakha 4 and Giza 843 as well as the Vicia sativa and Vicia monantha species being the best. Finally, integration of spectroscopy with multivariate statistics for discrimination among Vicia seeds, and phytochemical estimation might be of interest to wide range of disciplines and support application in many fields, such as applied spectroscopy, crop authentication and prevention of adulteration, food science and nutritional analysis, agronomy optimization, plant breeding, plant-based health products and developing phytopharmaceuticals, and biodiversity.
1 Comment
https://shorturl.fm/iZILl