Question

I'm looking to use spatial to locate vehicles within x miles of a given zip code. I'd like to use two tables, vehicle_listing and zip_code_detail where vehicle_listing has a ManyToOne relationship with zip_code_detail. My address table is made up of the entire zip code database which contains long/lat etc.

  1. Will spatial work properly with a join, or should I include long/lat within vehicle_listing?
  2. If I used @IndexEmbedded on my ManyToOne relationship and @Indexed zip_code_detail, would the entire zip_code_detail table be indexed, or just the zip_code_detail records being joined?

I'm looking for a database design with the best performance while minimizing memory consumption and ideally reducing data duplication.

Entity Design using MySql as a database.

@Entity
public class ZipDetail implements Serializable {

    @Id 
    @Column(length = 5)
    private String zip; 

    private String city;

    @ManyToOne
    @JoinColumn(name = "state_id")
    private State state;

    @ManyToOne
    @JoinColumn(name = "county_id")
    private County county;

    @NonVisual
    private String areaCodes;

    @NonVisual
    private Double latitude;

    @NonVisual
    private Double longitude;

    private String country;

VehicleListing.class

@Indexed
@Spatial(spatialMode = SpatialMode.GRID)
public class VehicleListing extends BaseEntity {


    @NonVisual
    @Latitude
    private Double latitude;

    @NonVisual
    @Longitude
    private Double longitude;

    @IndexedEmbedded
    @ManyToOne
    @JoinColumn(name = "year_id", nullable = false)
    private VehicleYear vehicleYear;

    @IndexedEmbedded
    @ManyToOne
    @JoinColumn(name = "make_id", nullable = false)
    private VehicleMake vehicleMake;

    @ManyToOne
    @JoinColumn(name = "zip_detail_id", nullable = false)
    private ZipDetail zipDetail;
Was it helpful?

Solution

I've provided a SQL Solution (I'm not that versed in MySQL) but I hope it will be a help to you - i.e. you can reverse engineer it into a similar solution.

Will spatial work properly with a join, or should I include long/lat within vehicle_listing?

In short, yes it will work fine. When you join the tables, any queries using information from both tables will use appropriate indexes on either table and produce the necessary filters to keep performance to a maximum - without duplication (which should always be minimised in any good data model).

Naturally, you'd expect to see a small improvement in performance if you store the latitude / longitude coordinates at the vehicle level because there will not be the overhead of making a join in your query, BUT you're then going to have to update lat / longs at the Vehicle Level (rather than just the association) AND are then going to force far more work onto the spatial index (assuming you have more vehicles than ZIP Codes) which ultimately I would expect to degrade performance. I would assume, unless you know for a fact you never will, that eventually you would have more vehicles than ZIP Codes given that ZIP codes do not change that often.

So assuming the following (ultra simplified for the example), I would do something like this (these were wrote before you posted the classes but are still relevant):

CREATE TABLE [Vehicles]
(
INT [Id],
INT [ZipCodeDetailId] -- Foreign Key on [Zip_Code_Detail].[Id] (Also create Index here)
);

CREATE TABLE [Zip_Code_Detail]
(
INT [Id],
GEOGRAPHY [Location] -- Ensure spatial index on here
);

You could then write the following:

DECLARE @searchDistance FLOAT = 1000; -- Distance in metres
DECLARE @searchFrom GEOGRAPHY = GEOGRAPHY::STPointFromText('POINT(12.3456 56.7890)', 4326);

SELECT
COUNT(V.*)
FROM [Vehicles] V
JOIN [Zip_Code_Detail] ZIP ON ZIP.[Id] = V.[ZipCodeDetailId]
WHERE
ZIP.[Location].STDistance(@searchFrom) <= @searchDistance;

In SQL on a point database of over 2m records and a random search distance I get sub 2s responses with over 1,000 results. You'll get far better times with a smaller database and my index is geared for multiple Geometry Types, not just Points.

I've answer based on several assumptions here:

  1. You are representing ZIP Codes as 5-Digit which means your table has approx 40,000 records.
  2. You representing ZIP Codes as central points rather than a Polygon boundary?
  3. The vehicles are assumed to be static (for example at the home address for the purpose of the query) and NOT in motion (which would require spatial data with "timestamps" on a separate table altogether).

Hope it helps in some way.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top