Thiết kế website giá rẻ

Question

(The complete code is a bit lengthy, so it’s attached at the end of the post.)
When performing LDA topic extraction with Gensim, my program encountered the following error, even though I am not directly using the triu function anywhere:

File "~/topicmodel.py", line 1, in <module>
    from gensim import corpora, models
  File "~/miniconda3/envs/lda_td/lib/python3.12/site-packages/gensim/__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/lda_td/lib/python3.12/site-packages/gensim/corpora/__init__.py", line 6, in <module>
    from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniconda3/envs/lda_td/lib/python3.12/site-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
    from gensim import interfaces, utils
  File "~/miniconda3/envs/lda_td/lib/python3.12/site-packages/gensim/interfaces.py", line 19, in <module>
    from gensim import utils, matutils
  File "~/miniconda3/envs/lda_td/lib/python3.12/site-packages/gensim/matutils.py", line 20, in <module>
    from scipy.linalg import get_blas_funcs, triu
ImportError: cannot import name 'triu' from 'scipy.linalg' (~/miniconda3/envs/lda_td/lib/python3.12/site-packages/scipy/linalg/__init__.py)

My environment:

Operating System: Ubuntu 22.04
Python Version: 3.12.3
Gensim Version: 4.3.2
SciPy Version: 1.13.0

Here’s my code:

from gensim import corpora, models  # Import libraries for topic modeling
import re  # Regular expressions for text cleaning
import nltk  # Natural Language Toolkit for text processing
from nltk.corpus import stopwords  # Stopwords list
from nltk.stem import WordNetLemmatizer  # Lemmatizer for word normalization
from collections import Counter  # Counting word frequencies

def preprocess_text(text):
    """
    Preprocesses text data by cleaning, tokenizing, removing stopwords, lemmatizing, and filtering low-frequency words.
    """
    # Cleaning
    text = re.sub(r'[^a-zA-Zs]', ' ', text)  # Remove non-alphanumeric characters
    text = text.lower()  # Convert text to lowercase

    # Tokenization
    nltk.download('punkt')  # Download Punkt sentence tokenizer (if not already downloaded)
    tokens = nltk.word_tokenize(text)  # Split text into tokens

    # Stopword removal
    nltk.download('stopwords')  # Download stopwords list (if not already downloaded)
    stop_words = stopwords.words('english')  # Get English stopwords
    stop_words.append("x")  # Add custom stopwords if needed
    stop_words.append("p")
    filtered_tokens = [token for token in tokens if token not in stop_words]  # Remove stopwords

    # Lemmatization
    nltk.download('wordnet')  # Download WordNet lemmatizer (if not already downloaded)
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]  # Lemmatize words

    # Low-frequency word removal
    word_counts = Counter(lemmatized_tokens)  # Count word frequencies
    min_count = 2  # Set minimum word frequency threshold
    filtered_tokens = [token for token in lemmatized_tokens if word_counts[token] >= min_count]  # Remove low-frequency words

    return filtered_tokens

def extract_topics(filenames, num_topics=20):
    """
    Extracts topics from multiple text files using LDA topic modeling.
    """
    processed_corpus = []
    for filename in filenames:
        with open(filename, "r", encoding="utf-8") as f:
            text = f.read()
            processed_text = preprocess_text(text)  # Preprocess text
            processed_corpus.append(processed_text)  # Add preprocessed text to corpus

    # Create dictionary and corpus
    dictionary = corpora.Dictionary(processed_corpus)  # Create dictionary of words
    corpus = [dictionary.doc2bow(text) for text in processed_corpus]  # Convert corpus to bag-of-words format

    # Train LDA model
    lda_model = models.LdaModel(corpus, id2word=dictionary, num_topics=num_topics)  # Train LDA model

    # Print topic keywords
    print(lda_model.print_topics())  # Print top keywords for each topic

# Example usage
filenames = ["eco1.txt", "eco2.txt", "eco3.txt"]  # List of filenames
extract_topics(filenames)  # Extract topics from files

Here’s what I have tried so far:

Upgraded SciPy to the latest version.
Reinstalled both SciPy and Gensim.
Checked my code for circular imports or naming conflicts.
Created a new virtual environment and reinstalled all dependencies.

However, the problem persists. I suspect there might be an issue within Gensim or its dependencies that causes an indirect reference to the triu function to fail.

Has anyone encountered a similar issue? Are there any solutions or troubleshooting tips you could suggest?

Thiết kế website giá rẻ

Danh mục

“cannot import” in Gensim/Scipy