Skip to content

Conversation

@yarikoptic
Copy link
Collaborator

@yarikoptic yarikoptic commented Nov 12, 2025

ATM there is no consistency across e.g. OpenNeuro datasets, e.g.

ds006267/dataset_description.json:
  Authors=['Katherine M. Cole', 'Shau-Ming Wei', 'Pedro E. Martinez', 'Tuong-Vi Nguyen', 'Michael D. Gregory', 'J. Shane Kippenhan', 'Philip D. Kohn', 'Steven J. Soldin', 'Lynnette K. Nieman', 'Jack A. Yanovski', 'Peter J. Schmidt', 'Karen F. Berman']
ds006269/dataset_description.json:
  Authors=['Lucy Pritchard', 'Ingrid Buller-Peralta', 'Sally M Till', 'Peter C Kind', 'Alfredo Gonzalez-Sulser']
ds006303/dataset_description.json:
  Authors=['Linke, Julia', 'Naim, Reut', 'Haller, Simone', 'Khosravi, Parmis', 'Scheinberg, Beck', 'Byrne, Meghan', 'Harrewijn, Anita', 'Leibenluft, Ellen', 'Brotman, Melissa', 'Winkler, Anderson', 'Pine, Daniel']

and that is why some are left ambigous like

ds003834/dataset_description.json:
  Authors=['Matteo Visconti di Oleggio Castello', 'James V. Haxby', 'M. Ida Gobbini']

where for Matteo I believe there is a composite last name of "Visconti di Oleggio Castello" per e.g.

❯ curl --silent https://raw.githubusercontent.com/bids-standard/pybids/refs/heads/main/.zenodo.json | grep Matteo
	  "name": "Visconti di Oleggio Castello, Matteo",

but for the other 2 authors, the only last word is the Family name.

TODOs

  • validation: add a check for validator to WARN about using First Last, in particular if any of the names has more than 2 components? @effigies do you see an easy way to do that?
  • anywhere else in the text to add information about this?

… names

ATM there is no consistency across e.g. OpenNeuro datasets, e.g.

    ds006267/dataset_description.json:
      Authors=['Katherine M. Cole', 'Shau-Ming Wei', 'Pedro E. Martinez', 'Tuong-Vi Nguyen', 'Michael D. Gregory', 'J. Shane Kippenhan', 'Philip D. Kohn', 'Steven J. Soldin', 'Lynnette K. Nieman', 'Jack A. Yanovski', 'Peter J. Schmidt', 'Karen F. Berman']
    ds006269/dataset_description.json:
      Authors=['Lucy Pritchard', 'Ingrid Buller-Peralta', 'Sally M Till', 'Peter C Kind', 'Alfredo Gonzalez-Sulser']
    ds006303/dataset_description.json:
      Authors=['Linke, Julia', 'Naim, Reut', 'Haller, Simone', 'Khosravi, Parmis', 'Scheinberg, Beck', 'Byrne, Meghan', 'Harrewijn, Anita', 'Leibenluft, Ellen', 'Brotman, Melissa', 'Winkler, Anderson', 'Pine, Daniel']

and that is why some are left ambigous like

    ds003834/dataset_description.json:
      Authors=['Matteo Visconti di Oleggio Castello', 'James V. Haxby', 'M. Ida Gobbini']

where for Matteo I believe there is a composite last name of "Visconti di Oleggio Castello" per e.g.

	❯ curl --silent https://raw.githubusercontent.com/bids-standard/pybids/refs/heads/main/.zenodo.json | grep Matteo
		  "name": "Visconti di Oleggio Castello, Matteo",
@effigies
Copy link
Collaborator

  • add a check for validator to WARN about using First Last, in particular if any of the names has more than 2 components? @effigies do you see an easy way to do that?

Regex?

@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.83%. Comparing base (373da35) to head (53e86b7).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2255   +/-   ##
=======================================
  Coverage   82.83%   82.83%           
=======================================
  Files          20       20           
  Lines        1672     1672           
=======================================
  Hits         1385     1385           
  Misses        287      287           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rwblair
Copy link
Member

rwblair commented Nov 13, 2025

I could support a warning for inconsistent comma usage, if any entry in Authors uses a comma either all should or none should. I can't see a way to do this in the current schema, I think we would need to add the equivalent of any(...) in python or some(...) in JS. And we could add all(...) while we are at it. We don't currently have the idea of lambdas in the expression language. Without doing that I imagine instead of a passing a function as an argument it would be a standalone expression language statement. This is then applied to the context, with something added to the scope to represent the current element of the list.

Would any one head this warning given the current noise in the output?

@yarikoptic
Copy link
Collaborator Author

  • add a check for validator to WARN about using First Last, in particular if any of the names has more than 2 components? @effigies do you see an easy way to do that?

Regex?

I must have been too tired! ;) the question is now "how". I thought now that most logical would be to add "format" which I pushed, but that might be too restrictive leading to ERRORs right away?

Otherwise, we need some custom rule which would use matches and I guess that is where @rwblair refers of us not having any way to map it across values of a metadata field?

@yarikoptic
Copy link
Collaborator Author

Ha -- so we are not testing against "known to be ok" https://github.com/bids-standard/bids-examples/ which I assume I have broken here? @effigies WDYT -- wouldn't it be worth testing against some "release" (known to be good) of the bids-examples thus preventing "regressions" (prior valid becomes invalid) in the specifications?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants