Skip to content

Conversation

@cxzhong
Copy link
Contributor

@cxzhong cxzhong commented Dec 5, 2025

Fix #41249 add sig_on and sig_off to the unsafe methods`

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

Refactor get and set methods to use indexing directly.
@cxzhong cxzhong reopened this Dec 5, 2025
@cxzhong cxzhong changed the title Add a bound check and Avoid segfault Use the safe method to avoid segfault Dec 5, 2025
@cxzhong cxzhong marked this pull request as ready for review December 5, 2025 15:00
@cxzhong cxzhong requested a review from dimpase December 5, 2025 15:19
Copy link
Member

@dimpase dimpase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get() is an unsafe method, as documented. What's the point of this? Just use __getitem__, which (should be, although I didn't check) is safe

@dimpase
Copy link
Member

dimpase commented Dec 6, 2025

please see my comment in #41249 - the whole point of get is to be as quick as possible. One can wrap the call to get_unsafe in sig_on/sig_off, introducing some overhead, less than your PR. But I don't see much point in it.

@cxzhong cxzhong changed the title Use the safe method to avoid segfault add sig_on and sig_off to avoid segfault Dec 6, 2025
@cxzhong
Copy link
Contributor Author

cxzhong commented Dec 6, 2025

In some cases, it will not cause a segfault

sage: zero_vector(3).get(5)
0
sage: zero_vector(3).get(8)
---------------------------------------------------------------------------
SignalError                               Traceback (most recent call last)
Cell In[2], line 1
----> 1 zero_vector(Integer(3)).get(Integer(8))

File sage/modules/free_module_element.pyx:1933, in sage.modules.free_module_element.FreeModuleElement.get()

SignalError: Segmentation fault
sage: zero_vector(3).get(10)
0
sage: zero_vector(3).get(19)
---------------------------------------------------------------------------
SignalError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 zero_vector(Integer(3)).get(Integer(19))

File sage/modules/free_module_element.pyx:1933, in sage.modules.free_module_element.FreeModuleElement.get()

SignalError: Segmentation fault

@dimpase
Copy link
Member

dimpase commented Dec 6, 2025

segfaults are triggered by hardware when you are trying to access memory not allocated for your process. So you can violate the boundaries of your array sometimes, without triggering it, it is normal.

I still wonder whether it's not better just to leave this get() alone. Maybe @nbruin can tell us.

@cxzhong
Copy link
Contributor Author

cxzhong commented Dec 6, 2025

I think the method is strange. like get() and set() methods most people will believe they are safe. But it makes them unsafe.

@dimpase
Copy link
Member

dimpase commented Dec 6, 2025

the naming is unfortunate. perhaps it should be renamed to getunsafe.

@cxzhong
Copy link
Contributor Author

cxzhong commented Dec 6, 2025

the naming is unfortunate. perhaps it should be renamed to getunsafe.

But we already have get_unsafe

@dimpase
Copy link
Member

dimpase commented Dec 6, 2025

get_unsafe is Cython only

@github-actions
Copy link

github-actions bot commented Dec 7, 2025

Documentation preview for this PR (built with commit 50390f5; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@user202729
Copy link
Contributor

user202729 commented Dec 17, 2025

do the following

  • make get_unsafe and set_unsafe cpdef instead of cdef
  • deprecate get and set, tell the user to use v[i] or get_unsafe and v[i] = value or set_unsafe respectively instead
  • delete get and set next year

looks like nbruin agree. But if one want to be careful, better git blame it first.

@cxzhong
Copy link
Contributor Author

cxzhong commented Dec 21, 2025

How do I change get and set to be safe method

@nbruin
Copy link
Contributor

nbruin commented Dec 21, 2025

  • make get_unsafe and set_unsafe cpdef instead of cdef

Actually, that's something that requires some testing. cpdef is not without cost: it includes (or at least it used to) include some code to test if it's inherited/overridden to make it behave a little bit more like a python method. That has a cost. It's worth checking if the penalty is measurable. For a very low-level routine such as accessing a vector entry it may very well be (it's the kind of thing that has a good chance of being inlined by the compiler in its simplest form).

So, to be safe, I would advise against cpdeffing them. If you want a python wrapper as well, define it separately and deal with the name clash that would arise on cython level (as in: don't name the wrapper the same as the cdef). WIth careful benchmarking you may be able to show this is not necessary.

@nbruin
Copy link
Contributor

nbruin commented Dec 21, 2025

How do I change get and set to be safe method

Just use __getitem__ and __setitem__. Those are safe and faster than even the unsafe wrappers.

@cxzhong
Copy link
Contributor Author

cxzhong commented Dec 21, 2025

How do I change get and set to be safe method

Just use __getitem__ and __setitem__. Those are safe and faster than even the unsafe wrappers.

I think maybe we could not expose these unsafe methods to users. it will lead segfault sometimes out of bound.

@dimpase
Copy link
Member

dimpase commented Dec 21, 2025

in Python, anything that starts with __ comes with no warranty whatsoever, and surely elsewhere in Sage one can find other __-starting things to break everything.

Besides you always have other ways to break the system, e.g. you can def print(..): to break printing, etc.

@nbruin
Copy link
Contributor

nbruin commented Dec 21, 2025

in Python, anything that starts with __ comes with no warranty whatsoever, and surely elsewhere in Sage one can find other __-starting things to break everything.

I think you might be confusing single _ prefix for private methods. Things like __setitem__ are standard slotted methods. I would say they come with quite the opposite of no warranty: they provide rather well-specified protocols and if sage doesn't adhere by them then that's a rather serious bug in sage (apart from the intentional protocol breakages that sage makes -- which are basically intentional bugs in sage from a python API perspective, like our violations of hash assumptions).

In python code, rather than using __setitem__ and __getitem__ directly, it's probably better to just use the A[...] syntactic sugar that cython offers, though.

@nbruin
Copy link
Contributor

nbruin commented Dec 21, 2025

I think maybe we could not expose these unsafe methods to users. it will lead segfault sometimes out of bound.

Yes, that's the conclusion we came to. On cython level, there's a benefit to get_unsafe because it can get compiled to a C function call (or be inlined!). But on python level, the ordinary method lookup for get apparently creates so much overhead that the bounds checking done in __getitem__ bounds checking is still faster.

That's also why we should NOT make get_unsafe a cpdef method: the extra stuff that gets added to make it behave more like a python method will likely produce a measurable slow-down.

Just deprecate get and point people to use index syntax instead. It doesn't look like we can do much better at the moment on python level.

@dimpase
Copy link
Member

dimpase commented Dec 21, 2025

we can rename get_unsafe, adding _ or __ in front

@nbruin
Copy link
Contributor

nbruin commented Dec 21, 2025

we can rename get_unsafe, adding _ or __ in front

What would we gain from that? The cython cdef get_unsafe is completely fine.

This PR is about the python-level method get that is unsafe AND slower than just indexing. So the rationale for exposing it at all is rather weak. The PR is NOT proposing to equip get_unsafe with sig_on and sig_off.

@nbruin
Copy link
Contributor

nbruin commented Dec 22, 2025

As an illustration of the cost of cpdef, consider the following:

cython("""
cdef class myclass:
    cdef int a
    cdef get(self):
        return self.a
    cdef set(self,int n):
        self.a = n
    cpdef pget(self):
        return self.a
    cpdef pset(self,int n):
        self.a = n
    def testset(self,N,n):
        cdef int_n = n
        for i in range(N):
            self.set(int_n)
    def testget(self,N):
        cdef int_n
        for i in range(N):
            int_n = self.get()
        return int_n
    def testpset(self,N,n):
        cdef int_n = n
        for i in range(N):
            self.pset(int_n)
    def testpget(self,N):
        cdef int_n
        for i in range(N):
            int_n = self.pget()
        return int_n
""")

With this code, instances of the original class are fine:

sage: A = myclass()
sage: %time A.testset(10^8,1)
CPU times: user 1.55 s, sys: 0 ns, total: 1.55 s
Wall time: 1.56 s
sage: %time A.testget(10^8)
CPU times: user 1.09 s, sys: 920 µs, total: 1.09 s
Wall time: 1.09 s
1
sage: %time A.testpset(10^8,1)
CPU times: user 1.68 s, sys: 918 µs, total: 1.68 s
Wall time: 1.68 s
sage: %time A.testpget(10^8)
CPU times: user 1.38 s, sys: 946 µs, total: 1.38 s
Wall time: 1.38 s
1

However, if we subclass, the p in the cpdef means it needs to test if the method has been overridden on the subclass, which costs time:

sage: %time B.testset(10^8,1)
CPU times: user 1.66 s, sys: 973 µs, total: 1.66 s
Wall time: 1.66 s
sage: %time B.testget(10^8)
CPU times: user 1.16 s, sys: 979 µs, total: 1.16 s
Wall time: 1.16 s
1
sage: %time B.testpset(10^8,1)
CPU times: user 4.08 s, sys: 0 ns, total: 4.08 s
Wall time: 4.08 s
sage: %time B.testpget(10^8)
CPU times: user 3.76 s, sys: 2.95 ms, total: 3.76 s
Wall time: 3.76 s
1

The cpdef method has different semantics: it can actually be overridden in a python subclass, whereas the cdef can only be overridden in a cython subclass (because the method doesn't even exist on python level!), but there is a significant price for that. So don't change get_unsafe and set_unsafe to cpdef. That's not what they are intended for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation Fault when trying to get out-of-bounds index of vector

4 participants