Categories: stanford-nlp, ner

Stanford NER custom model accuracy testing

1 answer

I am working on Entity extraction using a custom model. I trained my CRF based model on a large dataset as

java -Xmx16g stanford-ner.jar -prop ner.prop

using these features

Property file (ner.prop)

 trainFile = training_data_IOB.tsv  #serializeTo = ner-model.ser.gz  map = word=0,answer=1   useClassFeature=true  useWord=true  qnSize=10  entitySubclassification=IOB1  retainEntitySubclassification=true  mergeTags=true  useNGrams=true  noMidNGrams=true  maxNGramLeng=6  usePrev=true  useNext=true  useSequences=true  usePrevSequences=true  maxLeft=1  useTypeSeqs=true  useTypeSeqs2=true  useTypeySequences=true  wordShape=chris2useLC  useDisjunctive=true  useGazettes=true  gazette=gazetter.txt  sloppyGazette=true 

Training file (training_data_IOB.tsv)

Thousands   O of  O demonstrators   O have    O marched O through O London  B-LOC to  O protest O the O war O in  O Iraq    B-LOC ...     ... 

Gazette file(gazetter.txt)

B-LOC   Iraq B-LOC   Afghanistan B-ORG   Congressional B-LOC   Bangladesh B-LOC   Canada B-ORG   ... 

the new model is created as ner-model.ser.gz and working quite well.

Now my question is, How I can calculate its percentage accuracy on any unseen(new) data without any manual counting and calculations??

I'm new in this field so kindly post a detailed descriptive answer. Thanks for your time.

All answers to this question, which has the identifier 54166039

The best answer:

If you create a conll file with the gold tags for a test set you can use this command and it will output scores (this example runs our model, replace with your custom model):

java -Xmx2g -loadclassifier edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz -testFile testData.conll 

Last questions

how do i remove the switch on my home screen?
how to edit the JS date and time to update atuomatically?
How to utilize data stored in a multidimensional array
Powermockito not mocking URL constructor in URI.toURL() method
Android Bluetooth LE Scanner only scans when phone's Location is turned on in some devices
docker wordpress container can't connect to mysql container
How can I declare a number in java that is more than 64-bits? [duplicate]
Optaplanner solutionClass entityCollectionProperty should never return null error when simple JSON object passed to controller
Anylogic, get the time a pedestrain is in a queue
How do I fix this syntax issue with my .flex file?
Optimizing query in PHP
How to find the highest number of a column and print two columns of that row in R?
Ideas on “Error: Type is referenced as an interface from”?
JCIFS SmbFile.exists() and SmbFile.isDirectory() return false when it exists and I can listFiles()
PHP total order
Laravel booking system design
neural net - undefined column selected
How to indicate y axis does not start from 0 in ggplot?
Fragments in backStack
Spinner how to change the data