Find Jobs
Hire Freelancers

Named Entity Recognition in Java - Data Mining/Machine Learning Related Project

$100-350 USD

Lezárt
Kiadva ekkor: majdnem 15 évvel ezelőtt

$100-350 USD

Teljesítéskor fizetve
You will be required to program in Java - you have a choice of modifying on existing programs or re-write your own if you wish. The candidates will also need to have relatively good experience/skill with? Data Mining / Machine Learning. If so, please read on. ## Deliverables The design of this? project? is and will be required to? based on the following paper: <[login to view URL]> Please have a look at this first as it will give you an idea of what will be expected. Please note you? won't be required? to add all of the features on the above paper? for the NEC part? but only the important ones(around 3 features). You will need to? post daily updates on progress and subtask completion, this is strictly required for the project. Please note this project will invlove writing a 15 pages report as? part of? developing on an existing java program(or you can re-write your own). At the moment I have a java program which reads in some MUC-7 training data and produces a standard ARFF file. The training data are fully tagged data from the MUC corpus. I have included them in the attached files. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the 'WEKA machine learning software'. Please refer to [login to view URL]~ml/weka/[login to view URL] for more details about ARFF files if you are not sure. This is very important for the project. If you do not have WEKA installed on your PC, you will need to install this as a tool for the program. This can be downloaded from [login to view URL] for free. Currently the program produces a standard ARFF file under NER\src\wekadata folder after compiling and running NER\src\view\[login to view URL] from the attached file. The ARFF file consists of: The header is made up of: 1) a list of attributes - in this case the entire vocabulary of the training data, and the values each one can have being either 0 or 1. 2) the punctuations are counted as part of the vocabulary as well. 3) three imaginary empty tokens at the beginning of the training data and three imaginary tokens by the end of the training data. 4) One unknown token ( the use of 3) and 4) will be explained later). For the data part of the ARFF file, Sparse data representation was used. If you are familiar with it you can refer to [login to view URL]~ml/weka/[login to view URL] and scroll down the page down further for a detailed introduction. I would like to explain a bit further though. Please take a look at the example I have give in the attached file named [login to view URL] Right, so that is what the program does now. When you have understood it fully, please read on what I'd like you to deliver next: 1. Train and test using Weka. You should create an arff file for say the training data and then run Weka in 10-fold cross validation mode (using a variety of classifiers) and see what scores you get. This will be part of your results. 2. From a programming point of view you also need to work out how to save a trained model from Weka and then run it on new, untagged data. Perhaps the easiest way to do this is from the command line -- see Chapter 1 in the attached. Or you can invoke from within a Java program. 3. In the end you want to NE tag a new untagged text -- to ? do this you need to convert the untagged text to ARFF, run it through a Weka classifier (which will add B I and O tags) and then convert the arff file back to regular text inserting <ENAMEX> tags as you go. You are recommended to modify upon the existing Java files, as they will be quite straight forward to follow. You could also write your own, in any case you are required to comment your code and clearly, indicate what action each function/method and section of your program performs. If you find anything vague in the requirements please contact me promptly and we shall resolve them together. As for the report, it will be? consisting of 3 chapers: Chapter 1: Requirements and analysis This should state, in a more detailed way, the objectives of the project by requirement and the analysis should break the problem down into manageable steps. There may be more than one suitable approach; the analysis may cover more of the area than is finally implemented. Testing and evaluation should be given due consideration. It is important that you state how you will evaluate your work. For a design project it is appropriate to consider testing at the same time as specification. Chapter 2: Design This should explain the design technique chosen (and justify why it is appropriate) from the various ones available; it should select a suitable subset of the things described in the analysis chapter and develop a design. Where trade-offs exist between different designs, the chosen approach should be justified. Suitable diagram-techniques (eg UML, other drawings) should be used where appropriate. If a method is applied selectively, explain which parts were used and why. Experimental projects should pay careful attention to control conditions, samples selected, etc. to ensure a valid result. Chapter 3: Implementation and testing In addition to illustrating "coding traps", this should highlight particular novel aspects to algorithms. Testing should be according to the scheme presented in the Analysis chapter and should follow some suitable model - eg category partition, state machine-based. Both functional testing and user-acceptance testing are appropriate. For experimental/investigative projects, techniques developed should be evaluated against a standard result set for calibration, as well as the "live" data set. For theoretical projects, the relative power/expressiveness of the theory should be evaluated with respect to competing approaches. Again, please bear in mind this project will require daily updates on what has been done. So please make sure you are ok with that before placing a bid.
Projektazonosító: 3771431

A projektről

1 ajánlat
Távolról teljesíthető projekt
Aktiválva: 15 évvel ezelőtt

Szeretne pénzt keresni?

A Freelancer oldalán történő árajánlatadás előnyei

Határozzon meg költségvetést és időkeretet
Kapja meg fizetését a munkáért
Vázolja ajánlatát
Ingyen regisztrálhat és adhat árajánlatot munkákra
1 szabadúszó adott átlagosan $276 USD összegű árajánlatot erre a munkára
Felhasználó avatár
See private message.
$276,25 USD 3 napon belül
4,9 (25 értékelés)
4,0
4,0

Az ügyfélről

UNITED KINGDOM zászlója
United Kingdom
5,0
4
Tagság kezdete: febr. 23, 2009

Ügyfél-hitelesítés

Köszönjük! E-mailben elküldtük a linket, melyen átveheti ajándék egyenlegét.
E-mailje elküldése során valami hiba történt. Kérjük, próbálja újra.
Regisztrált Felhasználók Összes Közzétett Munka
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Előnézet betöltése
Hozzáférést adott a helymeghatározáshoz.
Belépési munkamenete lejárt, és kijelentkeztettük. Kérjük, lépjen be újra.