¿Cómo puedo filtrada con el mejor rendimiento? (JAVA)

https://stackoverflow.com/questions/2083750

21-09-2019
|

Pregunta

Estoy trabajando en una pequeña oficina, tengo una aplicación, es generar un archivo de texto grande con 14000 líneas;

después de cada genero debo filtrarlo y es muy aburrido;

Quiero escribir una aplicación con Java ya hasta puedo manejarlo tan pronto como sea posible.

Por favor, ayúdame; Escribí una aplicación con escáner (Por supuesto, con ayuda :)) pero no es buena becase que era muy lento;

Por ejemplo, es mi archivo:

SET CELL:NAME=CELL:0,CELLID=3;
SET LSCID:NAME=LSC:0,NETITYPE=MDCS,T32=5,EACT=FILTER-NOFILTER-MINR-FILTER-NOFILTER,ENSUP=GV2&NCR,MINCELL=6,MSV=PFR,OVLHR=9500,OTHR=80,BVLH=TRUE,CELLID=3,BTLH=TRUE,MSLH=TRUE,EIHO=DISABLED,ENCHO=ENABLED,NARD=NAP_STLP,AMH=ENABLED(3)-ENABLED(6)-ENABLED(9)

y quiero esta salida (filtro:)

CELLID :  3
ENSUP  :  GV2&NCR
ENCHO  :  ENABLED
MSLH   :  TRUE
------------------------
Count of CELLID : 2

qué solución es la mejor y la más rápida que el otro?

es mi código fuente:

public static void main(String[] args) throws FileNotFoundException {
        Scanner scanner = new Scanner(new File("i:\\1\\2.txt"));
        scanner.useDelimiter(";|,");
        Pattern words = Pattern.compile("(CELLID=|ENSUP=|ENCHO=)");

        while (scanner.hasNextLine()) {
          String key = scanner.findInLine(words);

          while (key != null) {
            String value = scanner.next();
            if (key.equals("CELLID=")) 
              System.out.print("CELLID:" + value+"\n");
             //continue with else ifs for other keys
              else if (key.equals("ENSUP="))
            System.out.print("ENSUP:" + value+"\n");

            else if (key.equals("ENCHO="))
            System.out.print("ENCHO:" + value+"\n");
            key = scanner.findInLine(words);
          }
          scanner.nextLine();
        }

}

Muchas gracias de hecho ...

Solución

Debido a que su código tiene problemas de rendimiento, primero tiene que encontrar cuello de la botella. Puede perfil con perfiles disponibles con el IDE utiliza.

Sin embargo, desde su código no es muy alta en computación, pero IO intensiva, tanto en el archivo de salida y utilizando System.out.print lectura, que es donde yo te sugeriría que mejorar para mejorar el archivo IO.

Reemplazar esta línea de código

Scanner scanner = new Scanner(new File("i:\\1\\2.txt"));

Con esta líneas de código

File file = new File("i:\\1\\2.txt");
BufferedReader br = new BufferedReader( new FileReader(file)  );
Scanner scanner = new Scanner(br);

Vamos a saber si esto ayuda.

Puesto que la solución anterior no ha ayudado mucho, he hecho unos cuantos cambios para mejorar su código. Puede que tenga que corregir errores en el análisis si los hay. Yo era capaz de mostrar la salida de analizar 392832 líneas en aproximadamente 5 segundos. solución original tarda más de 50 segundos.

chages son como a continuación:

El uso de StringTokenizer en lugar de Escáner
El uso de BufferedReader para leer el archivo
El uso de StringBuilder al buffer de salida

public class FileParse {

    private static final int FLUSH_LIMIT = 1024 * 1024;
    private static StringBuilder outputBuffer = new StringBuilder(
            FLUSH_LIMIT + 1024);
    private static final long countCellId;

    public static void main(String[] args) throws IOException {
        long start = System.currentTimeMillis();
        String fileName = "i:\\1\\2.txt";
        File file = new File(fileName);
        BufferedReader br = new BufferedReader(new FileReader(file));
        String line;
        while ((line = br.readLine()) != null) {
            StringTokenizer st = new StringTokenizer(line, ";|, ");
            while (st.hasMoreTokens()) {
                String token = st.nextToken();
                processToken(token);
            }
        }
        flushOutputBuffer();
        System.out.println("----------------------------");
        System.out.println("CELLID Count: " + countCellId);
        long end = System.currentTimeMillis();
        System.out.println("Time: " + (end - start));
    }

    private static void processToken(String token) {
        if (token.startsWith("CELLID=")) {
            String value = getTokenValue(token);
            outputBuffer.append("CELLID:").append(value).append("\n");
            countCellId++;
        } else if (token.startsWith("ENSUP=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENSUP:").append(value).append("\n");
        } else if (token.startsWith("ENCHO=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENCHO:").append(value).append("\n");
        }
        if (outputBuffer.length() > FLUSH_LIMIT) {
            flushOutputBuffer();
        }
    }

    private static String getTokenValue(String token) {
        int start = token.indexOf('=') + 1;
        int end = token.length();
        String value = token.substring(start, end);
        return value;
    }

    private static void flushOutputBuffer() {
        System.out.print(outputBuffer);
        outputBuffer = new StringBuilder(FLUSH_LIMIT + 1024);
    }

}

Actualización sobre ensup y MSLH:

A mi me parece que haya cambiado ensup y MSLH en caso de declaración de la siguiente manera. De ahí que vea el valor "MSLH" para "ensup" y vice-versa.

} else if (token.startsWith("MSLH=")) {
    String value = getTokenValue(token);
    outputBuffer.append("ENSUP:").append(value).append("\n");
} else if (token.startsWith("ENSUP=")) {
    String value = getTokenValue(token);
    outputBuffer.append("MSLH:").append(value).append("\n");
}

Otros consejos

filtrado de texto simple es probablemente más fácil de escribir en Perl (mi elección, ya que he estado usando durante años) o Python (lo que recomiendo a gente nueva porque es un lenguaje más moderno).

Varias soluciones a un problema similar usando Java escáner o StreamTokenizer fueron recientemente discuten aquí .

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow