Difficulty of “learning” rare instances

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP








up vote
8
down vote

favorite












Is there any result showing that models (say SVM, Neural-Net, kNN, etc) will have difficulty in learning "rare" instances/tail phenomena?







share|cite|improve this question




















  • I think you accepted an answer too quickly -- it's a good answer, but there could be more possibilities out there as well.
    – usul
    Aug 8 at 8:35










  • @usul thanks for the comment. Do you have any further suggestion? Would be happy to hear additional thoughts.
    – Daniel
    Aug 8 at 14:13














up vote
8
down vote

favorite












Is there any result showing that models (say SVM, Neural-Net, kNN, etc) will have difficulty in learning "rare" instances/tail phenomena?







share|cite|improve this question




















  • I think you accepted an answer too quickly -- it's a good answer, but there could be more possibilities out there as well.
    – usul
    Aug 8 at 8:35










  • @usul thanks for the comment. Do you have any further suggestion? Would be happy to hear additional thoughts.
    – Daniel
    Aug 8 at 14:13












up vote
8
down vote

favorite









up vote
8
down vote

favorite











Is there any result showing that models (say SVM, Neural-Net, kNN, etc) will have difficulty in learning "rare" instances/tail phenomena?







share|cite|improve this question












Is there any result showing that models (say SVM, Neural-Net, kNN, etc) will have difficulty in learning "rare" instances/tail phenomena?









share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Aug 7 at 15:53









Daniel

484312




484312











  • I think you accepted an answer too quickly -- it's a good answer, but there could be more possibilities out there as well.
    – usul
    Aug 8 at 8:35










  • @usul thanks for the comment. Do you have any further suggestion? Would be happy to hear additional thoughts.
    – Daniel
    Aug 8 at 14:13
















  • I think you accepted an answer too quickly -- it's a good answer, but there could be more possibilities out there as well.
    – usul
    Aug 8 at 8:35










  • @usul thanks for the comment. Do you have any further suggestion? Would be happy to hear additional thoughts.
    – Daniel
    Aug 8 at 14:13















I think you accepted an answer too quickly -- it's a good answer, but there could be more possibilities out there as well.
– usul
Aug 8 at 8:35




I think you accepted an answer too quickly -- it's a good answer, but there could be more possibilities out there as well.
– usul
Aug 8 at 8:35












@usul thanks for the comment. Do you have any further suggestion? Would be happy to hear additional thoughts.
– Daniel
Aug 8 at 14:13




@usul thanks for the comment. Do you have any further suggestion? Would be happy to hear additional thoughts.
– Daniel
Aug 8 at 14:13










1 Answer
1






active

oldest

votes

















up vote
10
down vote



accepted










In the classic PAC learning (i.e., classification) model, rare instances are not a problem. This is because the learner's test points are assumed to come from the same distribution as the training data. Thus, if a region of space is so sparse as to be poorly represented in the training sample, its probability of appearing during the test phase is low.



You'll need a different learning model, which explicitly looks at type-I and type-II errors, or perhaps some combined precision-recall score. Here again, I don't think there are any results indicating that a specific class of algorithms is particularly poorly suited for this task, but I could be wrong.



The closest I can think of is sensitivity to outliers --- AdaBoost is known to have this property, for example.






share|cite|improve this answer




















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "114"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcstheory.stackexchange.com%2fquestions%2f41334%2fdifficulty-of-learning-rare-instances%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    10
    down vote



    accepted










    In the classic PAC learning (i.e., classification) model, rare instances are not a problem. This is because the learner's test points are assumed to come from the same distribution as the training data. Thus, if a region of space is so sparse as to be poorly represented in the training sample, its probability of appearing during the test phase is low.



    You'll need a different learning model, which explicitly looks at type-I and type-II errors, or perhaps some combined precision-recall score. Here again, I don't think there are any results indicating that a specific class of algorithms is particularly poorly suited for this task, but I could be wrong.



    The closest I can think of is sensitivity to outliers --- AdaBoost is known to have this property, for example.






    share|cite|improve this answer
























      up vote
      10
      down vote



      accepted










      In the classic PAC learning (i.e., classification) model, rare instances are not a problem. This is because the learner's test points are assumed to come from the same distribution as the training data. Thus, if a region of space is so sparse as to be poorly represented in the training sample, its probability of appearing during the test phase is low.



      You'll need a different learning model, which explicitly looks at type-I and type-II errors, or perhaps some combined precision-recall score. Here again, I don't think there are any results indicating that a specific class of algorithms is particularly poorly suited for this task, but I could be wrong.



      The closest I can think of is sensitivity to outliers --- AdaBoost is known to have this property, for example.






      share|cite|improve this answer






















        up vote
        10
        down vote



        accepted







        up vote
        10
        down vote



        accepted






        In the classic PAC learning (i.e., classification) model, rare instances are not a problem. This is because the learner's test points are assumed to come from the same distribution as the training data. Thus, if a region of space is so sparse as to be poorly represented in the training sample, its probability of appearing during the test phase is low.



        You'll need a different learning model, which explicitly looks at type-I and type-II errors, or perhaps some combined precision-recall score. Here again, I don't think there are any results indicating that a specific class of algorithms is particularly poorly suited for this task, but I could be wrong.



        The closest I can think of is sensitivity to outliers --- AdaBoost is known to have this property, for example.






        share|cite|improve this answer












        In the classic PAC learning (i.e., classification) model, rare instances are not a problem. This is because the learner's test points are assumed to come from the same distribution as the training data. Thus, if a region of space is so sparse as to be poorly represented in the training sample, its probability of appearing during the test phase is low.



        You'll need a different learning model, which explicitly looks at type-I and type-II errors, or perhaps some combined precision-recall score. Here again, I don't think there are any results indicating that a specific class of algorithms is particularly poorly suited for this task, but I could be wrong.



        The closest I can think of is sensitivity to outliers --- AdaBoost is known to have this property, for example.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Aug 7 at 17:59









        Aryeh

        4,6561632




        4,6561632






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcstheory.stackexchange.com%2fquestions%2f41334%2fdifficulty-of-learning-rare-instances%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            pylint3 and pip3 broken

            Missing snmpget and snmpwalk

            How to enroll fingerprints to Ubuntu 17.10 with VFS491