Initial Release
Using text vectors for article similarity comparison,The generation of vectors used Python’s sense transformers package, and similarity calculation used Iris’s vector cosine calculation. The front-end uses bootstrap and CSP
1.1 Pull the code to a local directory
$ git clone https://github.com/ADAQIQI/AriticleSimilarity.git
1.2 Download the model and place it in the specified directory, which can be found in the https://huggingface.co/ Taking the BERT model as an example(If the CPU and memory configuration are not sufficient, other models such as MiniLM can be considered), download it and place it in the same directory
1.3 If using other models, it is necessary to modify the dockerfile file
Change to the name of the model folder you have created.
1.4 docker-compose up -d
If the model used is not BERT, it is necessary to Modify Article Similarity Vector’s GetEmbeddingPy Change the directory to the folder used in 1.2
zpm install ariticlesimilarity
Reference 1.2 Place the model folder in the specified directory
install sentence-transformers
python3 -m pip install --target /usr/irissys/mgr/python sentence-transformers
Enter the homepage
http://localhost:52773/csp/user/Index.csp
Enter/upload the title and content of the article, with a demonstration text generated by Google Board
After clicking upload, wait for a period of time for the page to jump to paragraph similarity comparison(If the page reports an error due to a long return time, please wait for a while before refreshing the page and directly switch to the similarity comparison tab for similarity comparison). At the same time, the background will continue to split the article into sentences and compare them sentence by sentence
After refreshing the page, click the Sentence Similarity button, select the article title, and click the button to view the comparison of sentence similarity. Sentences with similarity higher than 0.7 (can be modified in the bo class) will be distinguished by color