Eleven studies were identified as including El Sharkwy as lead or sole author. These studied a total of 1909 participants between February 2010 and March 2018. Of the eleven, he was sole author of four studies published in 2013 (studies 1–4, with 764 participants); co-author, with one other, of two studies published in 2019 (studies 9 and 11, with 444 participants); and lead author, with between three and five others, of five studies published between 2016 and 2019 (studies 5, 6, 7, 8 and 10, with 707 participants; Table 1).
Three of the published studies were used in Cochrane reviews. Study 1 was used in two reviews [12, 18]; study 2 in three [19, 20, 21]; and study 3 in one [22].
Of the nine RCTs, no trial registrations were found for two (studies 3 and 5); study 1 was registered after it had been submitted for publication (submitted 18 Sept 2012; registered 26 Sept 2012); the remaining six (studies 6–11) were registered after their trials had started. Study 1 does not mention the start or end date of the trial in publication. Only studies 8 and 11 mention the trial registration number in the published study.
For studies 1 and 4, 270 women with polycystic ovarian syndrome (PCOS) resistant to clomiphene citrate (CC) were studied. Of these, 124 gave consent to a non-randomised study over 26 months (Feb 2010 - Mar 2012; study 4) and a further 146 were recruited from the same centre to an RCT in just 7 months (Jan - Jul 2012; study 1). Studies 9 and 11 were run concurrently at the same centre, where 444 women with CC-resistant PCOS were recruited in just 15 months. For both, the initial sample size was registered as 100 and later adjusted, but this was not mentioned in the publications. Study 9 recruited only obese women with CC-resistant PCOS; study 11 does not refer to obesity in its inclusion criteria but reports a mean BMI at inclusion of 29.6 (SD 2.4 and 3.3 for each group), so just under half of the participants must have had a BMI > 30.
All eleven studies were all performed in one trial centre, Zagazig University Hospital. The first four studies were all published in 2013 and are all single authored. The ethics committee of Zagazig University approved nine of the studies; study 4 was approved by ‘a local ethics committee’.
It was possible to compare the baseline characteristic of the women in studies 1, 4, 9 and 11 and the outcomes of studies 1 and 4, all of which are on women with CC-resistant PCOS, as they all shared at least five parameters. We did not compare baseline characteristics or outcomes of women in the other studies, as they did not have five or more paired parameters.
The baseline characteristics of the participants in studies 1 and 4 are remarkably similar (Fig. 2). Sixteen pairs are identical and shown by red dots (two means, eleven SDs and three p-values), and 32 pairs are highly similar and shown by blue dots (fifteen means, nine SDs, and eight p-values); altogether, 48 out of 60 (80%) pairs are identical or highly similar. In addition, the first digit of the p-value is the same for each of the twelve paired baseline characteristics.
The outcomes also show marked similarities between the published papers in both the metformin and ovarian drilling groups (Fig. 3).
Similarities in before and after metformin outcomes: 13 pairs are identical (1 mean, 7 SDs and 5 p-values); 21 pairs are highly similar (11 means, 8 SDs, and 2 p-values); altogether, 34 out of 58 (59%) of pairs. In addition, the first digit of the p-value is the same for all nine paired baseline characteristics.
Similarities in before and after drilling outcomes: 15 pairs are identical (4 means, 7 SD/percentages and 4 p-values); 16 pairs are highly similar (6 means and 10 SD/percentages); altogether, 31 out of 53 (58%) of pairs. In addition, the first digit of the p-value is the same for each of the 9 paired baseline characteristics; and the BMI pairs show 100% similarity (4 identical pairs and 1 highly similar pair).
In reviewing the remaining studies, similarities were also noted between studies 9 and 11. Four pairs are identical (3 means and 1 SD); 30 pairs are highly similar (14 means, 14 SDs and 2 p-values); altogether, 34 out of 80 (43%) of pairs. In addition, the first digit of the p-value is the same for each of the 16 paired baseline characteristics.
The participants in studies 1, 4, 9 and 11 all have clomiphene-resistant PCOS, so it would be expected that the baseline characteristics would be similar. However, this should not affect digit use in the decimal places, which for all 4 studies is remarkably similar (Figs. 2 and 4). The same is not seen in the outcomes.
It was possible to evaluate the p-values of baseline characteristics across the nine RCTs, as they all included continuous variables. The overall p-value using the Kolmogorov–Smirnov test was highly significant (p < 0.0001); that is, the probability that trial participants were grouped according to properly randomised processes is very low (Fig. 5).
The published p-values show remarkable similarities between studies, as shown above, especially between studies 1 and 4 and studies 9 and 11.
Study 8 published p-values in 3 tables along with summary values but did not specify whether the value was median or mean, what the +/- referred to or what statistical test was used. The quoted p-values could not therefore be reliably checked.
In study 10, it was unclear what test had been used on what data, and uncertainty over the numbers of participants as the demographic data table did not agree with the numbers given in the flow chart. The published values could not therefore be checked.
The p-value recalculations had a marked effect on the statistical significance of results (data in supplementary tables). For example, study 3 published insignificant differences in pre- and post-partum haemoglobin levels between the misoprostol and carbetocin groups, however both p-values show significance (< 0.001) after recalculation. Study 4 published an insignificant difference (p = 0.076) in BMI between the metformin and unilateral drilling groups after intervention, but recalculation shows statistical significance (p = 0.012), with significance defined as a p-value of < 0.05 in the article. Study 6 published a significant difference in induction to delivery interval and the percentage of women completing vaginal delivery within 24 hours, but recalculation shows insignificance for both using the levels of significance defined in the article. Study 11 published insignificant differences in the changes in post-treatment BMI, FSH and UL between the NAC and carnitine groups (p = 0.121, 0.156 and 0.114 respectively), however the p-values show significant differences (p = 0.0004, < 0.0001 and < 0.0001 respectively) after recalculation.
There were a very large number of women with CC-resistant PCOS recruited to studies 1, 4, 9 and 11 from the same single recruitment centre. Studies suggest that around 25% of the infertile population has PCOS, and that CC-resistant PCOS makes up about 20% of the PCOS population [23, 24]. For study 1 (recruitment Jan – July 2012), even with all eligible women consenting to participate, there must have been at least 730 women presenting with PCOS in seven months, which corresponds with 2920 infertility presentations, or about 5,000 in one year. For study 4 (recruitment Feb 2010 – March 2012) there must have been at least 620 women presenting with PCOS in two years, which corresponds with 2,480 infertility presentations, or about 1,145 in one year. For studies 9 and 11 (which together recruited 444 women between Jan 2017 and March 2018), there must have been at least 2,220 women presenting with PCOS in just over a year, corresponding with about 8,880 infertility presentations, or about 8200 in one year. This corresponds to over 30 new presentations of infertility every working day in a single centre.
We reached out to Professor El Sharkwy in 2021 about the p-values in study 3. He agreed that the p-values were incorrect but was unable to provide any original data. In our latest correspondence with him (15 November 2021) we explained our concerns over studies 1, 4, 9 and 11. We suggested that if the data were not valid, he could retract the relevant papers. If, however, he did not feel able to retract or explain, it would be our duty to present our findings to other researchers and guideline authors to warn them of our concerns. This would however have the effect of questioning all of his studies rather than just the four. There has been no response to date.