Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take over the original PyPI project? #32

Open
mgorny opened this issue Sep 1, 2023 · 9 comments
Open

Take over the original PyPI project? #32

mgorny opened this issue Sep 1, 2023 · 9 comments

Comments

@mgorny
Copy link

mgorny commented Sep 1, 2023

Since the original cchardet project is clearly no longer maintained, have you tried contacting the original author to give you permissions to take the PyPI project? And if that failed, applying for PEP 541 name reuse?

Creating a fork has the problem that some packages will now require cchardet and some will require faust-cchardet, and both can't be installed simultaneously which causes major problems for distributions.

@wbarnha
Copy link
Member

wbarnha commented Sep 1, 2023

I have reached out to the original author a long time ago but with no response. I forgot about PEP 541, thank you for bringing this to my attention. I will submit an application.

Edit: It seems after reading the requirements for reachability:

Reachability

The user of the Package Index is solely responsible for being reachable by the Package Index maintainers for matters concerning projects that the user owns. In every case where contacting the user is necessary, the maintainers will try to do so at least three times, using the following means of contact:

the e-mail address on file in the user’s profile on the Package Index;
the e-mail address listed in the Author field for a given project uploaded to the Index; and
any e-mail addresses found in the given project’s documentation on the Index or on the listed Home Page.

The maintainers stop trying to reach the user after six weeks.

It seems I need to reach out to PyYoshi a few more times before the owner is considered "unreachable".

@mgorny
Copy link
Author

mgorny commented Sep 1, 2023

Thanks.

I'm not sure if you are actually supposed to do that, and not the person handling your request. After all, how can PyPI admins know that you've actually contacted them?

I think filing a bug on their GitHub would also be a good step, as that is publicly visible.

@wbarnha
Copy link
Member

wbarnha commented Sep 1, 2023

Thanks.

I'm not sure if you are actually supposed to do that, and not the person handling your request. After all, how can PyPI admins know that you've actually contacted them?

I would forward emails to PyPi admins as evidence.

I think filing a bug on their GitHub would also be a good step, as that is publicly visible.

Agreed, I don't like the idea of invoking PEP 541, but it seems that this project is in need of it. Opening up an issue in advance would be morally right.

Edit: Sorry, I'm tired. I misread maintainers, assuming it referred to me, not the index maintainers. I'm still going to reach out again to show good faith.

@Mr0grog
Copy link

Mr0grog commented Oct 26, 2023

Howdy! Are there any updates on this?

Barring that, is there a future where the top-level name of this package is changed to alleviate collisions? (Granted it is useful that you can install this in place of cchardet and magically make other packages that know nothing about it work, but it does make a lot of situations messy, as the OP noted.)

@wbarnha
Copy link
Member

wbarnha commented Oct 27, 2023

Sorry, there are no updates on this at the moment. I have not been able to allocate the time to work on this. 😓

@wbarnha
Copy link
Member

wbarnha commented Nov 7, 2023

Reached out to the original developer, haven't heard back.

@mike-clark-8192
Copy link

mike-clark-8192 commented Mar 21, 2024

Could we use GitHub actions to automate the release of this package to PyPI under a second, separate namespace? That way people who are experiencing conflicts over import cchardet have the option to depend on / pip install faust-faust_cchardet and use it as import faust_cchardet? The primary namespace for this fork could still be cchardet, but people could access it via the auto-sync'd auto-published second package name to avoid the namespace overlap if they need/want that.

Incomplete GitHub Actions idea
name: Publish to PyPI with Renamed Namespace
on:
  push:
    tags:
      - 'v*'
jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with: { python-version: '3.x' }
    - name: Rename directory
      run: mv src/cchardet src/faust_cchardet
    - name: Update imports (if necessary)
      run: >-
        find . -type f -name '*.py' -exec
        sed -i 's/import cchardet/import faust_cchardet/g' {} +
    - name: Build the package
      run: python setup.py sdist bdist_wheel
    - name: Publish the package to PyPI
      uses: pypa/gh-action-pypi-publish@v1.4.2
      with:
        user: __token__
        password: ${{ secrets.PYPI_API_TOKEN }}

@wbarnha
Copy link
Member

wbarnha commented Mar 23, 2024

Could we use GitHub actions to automate the release of this package to PyPI under a second, separate namespace? That way people who are experiencing conflicts over import cchardet have the option to depend on / pip install faust-faust_cchardet and use it as import faust_cchardet? The primary namespace for this fork could still be cchardet, but people could access it via the auto-sync'd auto-published second package name to avoid the namespace overlap if they need/want that.
Incomplete GitHub Actions idea

Hi, sorry I've been away! I've bit off more than I could chew, I didn't expect this revival to become so important as a dependency. I'll file a PEP 541 request for cchardet and kafka-python since I've been meaning to hand off these projects to people who have more of a pertinent interest in them.

@milahu
Copy link

milahu commented Nov 21, 2024

Since the original cchardet project is clearly no longer maintained

no longer true
PyYoshi/cChardet had no pypi releases between 2020-10-27 and 2024-06-06

diff

cd $(mktemp -d)
git clone --depth=1 https://github.com/PyYoshi/cChardet
cd cChardet/
git remote add faust-cchardet https://github.com/faust-streaming/cChardet
git fetch faust-cchardet master
git rev-parse master
# fa74a8e43a2685767296f4cc5bc4594d28713ab1
git rev-parse faust-cchardet/master
# 3af7068fc6f04dc777531da021057bfbe75313b2
git diff --stat master faust-cchardet/master -- src/cchardet/
git diff master faust-cchardet/master -- src/cchardet/

git diff --stat

 src/cchardet/__init__.py        | 10 ++--------
 src/cchardet/__main__.py        |  4 ----
 src/cchardet/_cchardet.pyx      | 43 ++++++++-----------------------------------
 src/cchardet/cli/__init__.py    |  0
 src/cchardet/cli/cchardetect.py | 40 ----------------------------------------
 src/cchardet/version.py         |  1 +
 6 files changed, 11 insertions(+), 87 deletions(-)
git diff
diff --git a/src/cchardet/__init__.py b/src/cchardet/__init__.py
index f616d7f..c6db442 100644
--- a/src/cchardet/__init__.py
+++ b/src/cchardet/__init__.py
@@ -1,7 +1,5 @@
-from . import _cchardet
-
-version = (2, 2, 0, "alpha", 3)
-__version__ = "2.2.0a3"
+from cchardet import _cchardet
+from .version import __version__
 
 
 def detect(msg):
@@ -17,10 +15,6 @@ def detect(msg):
     encoding, confidence = _cchardet.detect_with_confidence(msg)
     if isinstance(encoding, bytes):
         encoding = encoding.decode()
-
-    if encoding == "MAC-CENTRALEUROPE":
-        encoding = "maccentraleurope"
-
     return {"encoding": encoding, "confidence": confidence}
 
 
diff --git a/src/cchardet/__main__.py b/src/cchardet/__main__.py
deleted file mode 100644
index a3e0fd8..0000000
--- a/src/cchardet/__main__.py
+++ /dev/null
@@ -1,4 +0,0 @@
-from .cli.cchardetect import main
-
-if __name__ == "__main__":
-    main()
diff --git a/src/cchardet/_cchardet.pyx b/src/cchardet/_cchardet.pyx
index 27d9f55..75af096 100644
--- a/src/cchardet/_cchardet.pyx
+++ b/src/cchardet/_cchardet.pyx
@@ -1,26 +1,19 @@
-# coding: utf-8
-#cython: embedsignature=True, c_string_encoding=ascii, language_level=3
-
 cdef extern from *:
     ctypedef char* const_char_ptr "const char*"
-    ctypedef unsigned long size_t
 
-# uchardet v0.0.8
 cdef extern from "uchardet.h":
     ctypedef void* uchardet_t
     cdef uchardet_t uchardet_new()
     cdef void uchardet_delete(uchardet_t ud)
-    cdef int uchardet_handle_data(uchardet_t ud, const_char_ptr data, size_t length)
+    cdef int uchardet_handle_data(uchardet_t ud, const_char_ptr data, int length)
     cdef void uchardet_data_end(uchardet_t ud)
     cdef void uchardet_reset(uchardet_t ud)
     cdef const_char_ptr uchardet_get_charset(uchardet_t ud)
-    cdef float uchardet_get_confidence(uchardet_t ud, size_t i)
-    # cdef const_char_ptr uchardet_get_encoding(uchardet_t ud, size_t i)
-    # cdef const_char_ptr uchardet_get_language(uchardet_t ud, size_t i)
+    cdef float uchardet_get_confidence(uchardet_t ud)
 
 def detect_with_confidence(bytes msg):
-    cdef size_t length = len(msg)
-
+    cdef int length = len(msg)
+    
     cdef uchardet_t ud = uchardet_new()
 
     cdef int result = uchardet_handle_data(ud, msg, length)
@@ -30,17 +23,8 @@ def detect_with_confidence(bytes msg):
 
     uchardet_data_end(ud)
 
-    cdef bytes detected_charset
-    # cdef bytes detected_encoding
-    # cdef const_char_ptr detected_language
-    cdef float detected_confidence
-
-    detected_charset = uchardet_get_charset(ud)
-    # detected_encoding = uchardet_get_encoding(ud, 0)
-    # detected_language = uchardet_get_language(ud, 0)
-    detected_confidence = uchardet_get_confidence(ud, 0)
-
-    uchardet_reset(ud)
+    cdef bytes detected_charset = uchardet_get_charset(ud)
+    cdef float detected_confidence = uchardet_get_confidence(ud)
     uchardet_delete(ud)
 
     if detected_charset:
@@ -53,8 +37,6 @@ cdef class UniversalDetector:
     cdef int _done
     cdef int _closed
     cdef bytes _detected_charset
-    # cdef bytes _detected_encoding
-    # cdef const_char_ptr _detected_language
     cdef float _detected_confidence
 
     def __init__(self):
@@ -62,8 +44,6 @@ cdef class UniversalDetector:
         self._done = 0
         self._closed = 0
         self._detected_charset = b""
-        # self._detected_encoding = b""
-        # self._detected_language = b""
         self._detected_confidence = 0.0
 
     def reset(self):
@@ -71,8 +51,6 @@ cdef class UniversalDetector:
             self._done = 0
             self._closed = 0
             self._detected_charset = b""
-            # self._detected_encoding = b""
-            # self._detected_language = b""
             self._detected_confidence = 0.0
             uchardet_reset(self._ud)
 
@@ -95,18 +73,13 @@ cdef class UniversalDetector:
                 self._done = 1
 
             self._detected_charset = uchardet_get_charset(self._ud)
-            # self._detected_encoding = uchardet_get_encoding(self._ud, 0)
-            # self._detected_language = uchardet_get_language(self._ud, 0)
-            self._detected_confidence = uchardet_get_confidence(self._ud, 0)
+            self._detected_confidence = uchardet_get_confidence(self._ud)
 
     def close(self):
         if not self._closed:
             uchardet_data_end(self._ud)
-
             self._detected_charset = uchardet_get_charset(self._ud)
-            # self._detected_encoding = uchardet_get_encoding(self._ud, 0)
-            # self._detected_language = uchardet_get_language(self._ud, 0)
-            self._detected_confidence = uchardet_get_confidence(self._ud, 0)
+            self._detected_confidence = uchardet_get_confidence(self._ud)
 
             uchardet_delete(self._ud)
             self._closed = 1
diff --git a/src/cchardet/cli/__init__.py b/src/cchardet/cli/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/src/cchardet/cli/cchardetect.py b/src/cchardet/cli/cchardetect.py
deleted file mode 100755
index 485174c..0000000
--- a/src/cchardet/cli/cchardetect.py
+++ /dev/null
@@ -1,40 +0,0 @@
-import argparse
-import sys
-
-from .. import UniversalDetector, __version__
-
-
-def read_chunks(f, chunk_size):
-    chunk = f.read(chunk_size)
-    while chunk:
-        yield chunk
-        chunk = f.read(chunk_size)
-
-
-def main():
-    parser = argparse.ArgumentParser()
-    parser.add_argument(
-        "files",
-        nargs="*",
-        help="Files to detect encoding of",
-        type=argparse.FileType("rb"),
-        default=[sys.stdin.buffer],
-    )
-    parser.add_argument("--chunk-size", type=int, default=(256 * 1024))
-    parser.add_argument("--version", action="version", version="%(prog)s {0}".format(__version__))
-    args = parser.parse_args()
-
-    for f in args.files:
-        detector = UniversalDetector()
-        for chunk in read_chunks(f, args.chunk_size):
-            detector.feed(chunk)
-        detector.close()
-        print(
-            "{file.name}: {result[encoding]} with confidence {result[confidence]}".format(
-                file=f, result=detector.result
-            )
-        )
-
-
-if __name__ == "__main__":
-    main()
diff --git a/src/cchardet/version.py b/src/cchardet/version.py
new file mode 100644
index 0000000..f43fee1
--- /dev/null
+++ b/src/cchardet/version.py
@@ -0,0 +1 @@
+__version__ = '2.1.19'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants